-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Play around with index_mode in Fleet #147684
Conversation
The draft PR has several purpose * Better understand how the setup of data stream settings works on my end * Leave comments to further discuss the expected end behaviour * Learn more about the Fleet code base on my end This PR is not expected to be merged but to have conversations.
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Unknown metric groupsESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
To update your PR or re-run it, just comment with: cc @ruflin |
The testing description in #140035 I think mostly answers my question around the manual testing of packages for these kind of scenarios. Seems the registry is the answer. |
Yes this is accurate. Fleet developers typically point at either the live epr.elastic.co registry in development, or use the one provided by the We also rely heavily on our test package fixtures + FTR tests here: https://github.com/elastic/kibana/tree/main/x-pack/test/fleet_api_integration/apis/fixtures/test_packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin - Took a pass at answering some of the questions in your comments. @nchaulet @hop-dev @juliaElastic you will probably have some additional insight to the specifics here as well, so feel free to chime in anywhere I may have missed something. Thanks all!
@@ -65,6 +67,7 @@ export const ExperimentDatastreamSettings: React.FunctionComponent<Props> = ({ | |||
typeof syntheticSourceExperimentalValue !== 'undefined' | |||
? syntheticSourceExperimentalValue | |||
: isSyntheticSourceEnabledByDefault, | |||
// I initially was looking for this but couldn't find it as I was looking for time_series. Should we align on naming? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point about alignment here. I went with tsdb
because it seemed that's how the Elasticsearch teams were talking about the feature in their issues/PR's. Generally, though, I'm in favor of avoiding acronyms/abbreviations in code wherever possible so I'd be +1 for moving to time_series
in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fully honest, I find time_series
pretty confusing as all data streams are time series. But unfortunately it is what the Elasticsearch team has selected so at least in the code base, I would align it. What we show as "text" in the UI can be a different story.
// This sets the features on change, will it update data stream itself? | ||
// Is this were rollover should happen? | ||
// Ideally there would be a "data management" part in Fleet (or outside) where on a change | ||
// the policy API could just make a call with the requested change and would get back if it was applied or | ||
// not, not having to think about templates, data streams etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the frontend state tracking of the opt-in state for each experimental indexing feature. The package_policy
API handles actually updating templates here:
kibana/x-pack/plugins/fleet/server/services/package_policy.ts
Lines 182 to 183 in 8652775
// Handle component template/mappings updates for experimental features, e.g. synthetic source | |
await handleExperimentalDatastreamFeatureOptIn({ soClient, esClient, packagePolicy }); |
kibana/x-pack/plugins/fleet/server/services/package_policies/experimental_datastream_features.ts
Lines 44 to 152 in 8652775
export async function handleExperimentalDatastreamFeatureOptIn({ | |
soClient, | |
esClient, | |
packagePolicy, | |
}: { | |
soClient: SavedObjectsClientContract; | |
esClient: ElasticsearchClient; | |
packagePolicy: PackagePolicy | NewPackagePolicy; | |
}) { | |
if (!packagePolicy.package?.experimental_data_stream_features) { | |
return; | |
} | |
// If we're performing an update, we want to check if we actually need to perform | |
// an update to the component templates for the package. So we fetch the saved object | |
// for the package policy here to compare later. | |
let installation; | |
if (packagePolicy.package) { | |
installation = await getInstallation({ | |
savedObjectsClient: soClient, | |
pkgName: packagePolicy.package.name, | |
}); | |
} | |
for (const featureMapEntry of packagePolicy.package.experimental_data_stream_features) { | |
const existingOptIn = installation?.experimental_data_stream_features?.find( | |
(optIn) => optIn.data_stream === featureMapEntry.data_stream | |
); | |
const isSyntheticSourceOptInChanged = | |
existingOptIn?.features.synthetic_source !== featureMapEntry.features.synthetic_source; | |
const isTSDBOptInChanged = existingOptIn?.features.tsdb !== featureMapEntry.features.tsdb; | |
if (!isSyntheticSourceOptInChanged && !isTSDBOptInChanged) continue; | |
const componentTemplateName = `${featureMapEntry.data_stream}@package`; | |
const componentTemplateRes = await esClient.cluster.getComponentTemplate({ | |
name: componentTemplateName, | |
}); | |
const componentTemplate = componentTemplateRes.component_templates[0].component_template; | |
if (isSyntheticSourceOptInChanged) { | |
const body = { | |
template: { | |
...componentTemplate.template, | |
mappings: { | |
...componentTemplate.template.mappings, | |
_source: { | |
mode: featureMapEntry.features.synthetic_source ? 'synthetic' : 'stored', | |
}, | |
}, | |
}, | |
}; | |
await esClient.cluster.putComponentTemplate({ | |
name: componentTemplateName, | |
// @ts-expect-error - TODO: Remove when ES client typings include support for synthetic source | |
body, | |
}); | |
} | |
if (isTSDBOptInChanged && featureMapEntry.features.tsdb) { | |
const mappingsProperties = componentTemplate.template?.mappings?.properties ?? {}; | |
// All mapped fields of type keyword and time_series_dimension enabled will be included in the generated routing path | |
// Temporarily generating routing_path here until fixed in elasticsearch https://github.com/elastic/elasticsearch/issues/91592 | |
const routingPath = builRoutingPath(mappingsProperties); | |
if (routingPath.length === 0) continue; | |
const indexTemplateRes = await esClient.indices.getIndexTemplate({ | |
name: featureMapEntry.data_stream, | |
}); | |
const indexTemplate = indexTemplateRes.index_templates[0].index_template; | |
const indexTemplateBody = { | |
...indexTemplate, | |
template: { | |
...(indexTemplate.template ?? {}), | |
settings: { | |
...(indexTemplate.template?.settings ?? {}), | |
index: { | |
mode: 'time_series', | |
routing_path: routingPath, | |
}, | |
}, | |
}, | |
}; | |
await esClient.indices.putIndexTemplate({ | |
name: featureMapEntry.data_stream, | |
body: indexTemplateBody, | |
}); | |
} | |
} | |
// Update the installation object to persist the experimental feature map | |
await updateDatastreamExperimentalFeatures( | |
soClient, | |
packagePolicy.package.name, | |
packagePolicy.package.experimental_data_stream_features | |
); | |
// Delete the experimental features map from the package policy so it doesn't get persisted | |
delete packagePolicy.package.experimental_data_stream_features; | |
} |
Is this where rollover should happen
Automatic rollover should happen in the above server-side code. I don't think it'd be too challenging, but there might be caveats of which I am unaware. Issue at #143448
Ideally there would be a "data management" part in Fleet (or outside) where on a change the policy API could just make a call with the requested change and would get back if it was applied or not, not having to think about templates, data streams etc.
I'm in agreement here. We've talked about something like a /api/fleet/customize
API to handle cases like this around pushing custom configuration objects. @joshdover has a great pass at designing this kind of API at #121118, namely the Customize API
section feels relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this is already all started to be tracked. The reason I'm keen on #143448 to happen is that I think at the moment when a user "toggles" any of the features, the user assumes it is working now but in reality the user has to trigger a rollover first. We already do rollovers on upgrades when needed so I would assume the functionality + error handling mostly already exists.
/api/fleet/customize
I might go further and not put it into the fleet
plugin. From my perspective it goes so far, that this is a data management API that should be exposed by Kibana "core" and is only used by Fleet. Need to read up on the full issue and comment there.
// It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet) | ||
// Is there a way in Elasticsearch to say: For data stream X, what template will apply? The reason I'm asking | ||
// is that there is a chance that users made some modification in the settings or that for a namespace an additional | ||
// template was set in place. So if we only look at the data stream, the info might be wrong / outdated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet)
This is correct. Our current implementation does not consider whether the currently "visible" value of a given toggle has actually been applied (rolled over) to the given data stream. I think this okay-ish for now, since the features are presented quite heavily as experimental, and I expect them to be removed in favor of package spec support as the source of truth in the near future.
If we had automated rollover #143448, I think this issue would go away, as well. So, if that's low hanging fruit as I think it is, maybe that's the quick solve here.
Is there a way in Elasticsearch to say: For data stream X, what template will apply?
The reason I'm asking is that there is a chance that users made some modification in the settings or that for a namespace an additional template was set in place. So if we only look at the data stream, the info might be wrong / outdated.
This I do not know. Poking around the index template and data stream docs for Elasticserach didn't turn up much, but there might be some internal API to expose this. Or maybe there's a clever search operation we could use here. Would be interesting to look into it further, but not something I'm immediately aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know in case the Fleet team is interested in having such an API, I might be able to contribute it to Elasticsearch.
@@ -143,6 +147,7 @@ export const PackagePolicyInputPanel: React.FunctionComponent<{ | |||
const match = packagePolicy.package!.experimental_data_stream_features.find( | |||
(feat) => feat.data_stream === dsName | |||
); | |||
// If the user updates the TSDB setting, will this make its way to the template? Could not find where. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep happens here:
kibana/x-pack/plugins/fleet/server/services/package_policies/experimental_datastream_features.ts
Lines 122 to 140 in 8652775
const indexTemplateBody = { | |
...indexTemplate, | |
template: { | |
...(indexTemplate.template ?? {}), | |
settings: { | |
...(indexTemplate.template?.settings ?? {}), | |
index: { | |
mode: 'time_series', | |
routing_path: routingPath, | |
}, | |
}, | |
}, | |
}; | |
await esClient.indices.putIndexTemplate({ | |
name: featureMapEntry.data_stream, | |
body: indexTemplateBody, | |
}); | |
} |
@@ -538,6 +538,7 @@ export function parseDataStreamElasticsearchEntry( | |||
parsedElasticsearchEntry.privileges = expandedElasticsearch.privileges; | |||
} | |||
|
|||
// we parse the settings from the data stream, what abou tthe values from the index template? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not following this one. Could you provide any more info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To decide what to show to the user, we parse the value of the current data stream setting. But what if through a package upgrade the value of the template has been adjusted but the data stream still has the old value. This partially goes back to the above discussion around triggering a rollover when this happens which would resolve this mostly.
The second missing part also what we touched on above: There is no way currently to tell which template applies to a data stream. We know it today because of the convention but if we would have templates per namespace too, it gets trickier. An ES API for it would help here.
The more I think about all these scenarios the more I think all this handling should only be temporary in Fleet itself but be done in Elasticsearch itself. Tell Elasticsearch to change a setting for a data stream and it will do all the updates and rollovers needed ...
// TODO: Is this the place that index_mode should be added as it belongs into template.settings.index.mode | ||
// On package upgrade, what happens if the previous package version had index_mode: time_series and now it is | ||
// switched back? This would work for the template but not when the data stream is rolled over | ||
// Ideally TSDB can be disabled again but is not possible today -> validation on the package side needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point that we could set index.mode
as normal here, but the next rollover would fail. Not a very good experience around upgrading package versions in that case. We could probably catch the case where a package tried to disable tsdb
after enabling it in a previous version here and throw an error, but it'd result in a package version that cannot be upgraded to. So, it makes the most sense to add some package spec validation here blocking a release that includes a change from index_mode: time_series
-> default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to the Elasticsearch team about the issue that tsdb currently can't be disabled. elastic/elasticsearch#92544 was opened to make it possible to turn it off again. If this lands, this problem should go away.
@@ -281,6 +285,7 @@ export function buildComponentTemplates(params: { | |||
(dynampingTemplate) => Object.keys(dynampingTemplate)[0] | |||
); | |||
|
|||
// In case of index_mode: time_series this should be set to true by default except it is overwritten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me - we should try to map out the relationship between synthetic source + time series indexing in code to make this clear. Rather have some declarative structure that describes that relationship rather than a bunch of chained conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++. There should be a single place to look up and change all this logic. Similar to a state machine.
// These are tests for source_mode set in the package. Couldn't find any integration tests around the policy part? | ||
// Is this all covered by unit tests? I'm especially curious how an upgrade around this feature is tested? | ||
// And how do you test it manually with uploading a package when you run from source -> run your own registry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are tests for source_mode set in the package. Couldn't find any integration tests around the policy part?
I don't think we have integration tests for the toggling functionality/API interactions here, so those would probably be good to add.
We do have unit tests around the policy generation in
- https://github.com/elastic/kibana/blob/86527753a2db307138ea69e0ba04551717a91844/x-pack/plugins/fleet/common/services/package_to_package_policy.test.ts
- https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/common/services/simplified_package_policy_helper.test.ts
There's UI driven tests around the toggling that assert on whether we're generating the expected API payloads, defaulting to the expected values based on the existing data streams, etc here as well: https://github.com/elastic/kibana/blob/86527753a2db307138ea69e0ba04551717a91844/x-pack/plugins/fleet/public/applications/fleet/sections/agent_policy/create_package_policy_page/components/steps/components/experimental_datastream_settings.test.tsx
And how do you test it manually with uploading a package when you run from source -> run your own registry?
We create a new version of package that includes time series features, copy that new version over to a data directory, then point the local registry at that data directory. Then, the test package version is available as normal in Kibana, Fleet API, etc for testing upgrades. We don't use the upload package feature in this case. The registry service takes a config that includes locations of package manifests on disk, so we can't point it at a few directories that it'll use to source packages.
source_mode: "synthetic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some assertions around updates in
describe('updates all assets when updating a package to a different version', async () => { |
source_mode
or index_mode
are toggled. Definitely worth adding some, especially since the package spec values will be the sole source of truth in the future and the UI toggles will be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kpollich Appreciate the time you spent to respond to all the comments in detail and share more background info. It's good to see the team is already aware of many of the challenges.
One take away for me is: Fleet has to do a lot of data management / changes to data streams, templates and from my perspective more of this should happen in Elasticsearch directly. Please reach out if you have some ideas on what you would like to see in Elasticsearch directly.
// This sets the features on change, will it update data stream itself? | ||
// Is this were rollover should happen? | ||
// Ideally there would be a "data management" part in Fleet (or outside) where on a change | ||
// the policy API could just make a call with the requested change and would get back if it was applied or | ||
// not, not having to think about templates, data streams etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this is already all started to be tracked. The reason I'm keen on #143448 to happen is that I think at the moment when a user "toggles" any of the features, the user assumes it is working now but in reality the user has to trigger a rollover first. We already do rollovers on upgrades when needed so I would assume the functionality + error handling mostly already exists.
/api/fleet/customize
I might go further and not put it into the fleet
plugin. From my perspective it goes so far, that this is a data management API that should be exposed by Kibana "core" and is only used by Fleet. Need to read up on the full issue and comment there.
// It seems this logic does check for the data stream setting. What if template and data stream are not in sync (no rollover yet) | ||
// Is there a way in Elasticsearch to say: For data stream X, what template will apply? The reason I'm asking | ||
// is that there is a chance that users made some modification in the settings or that for a namespace an additional | ||
// template was set in place. So if we only look at the data stream, the info might be wrong / outdated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know in case the Fleet team is interested in having such an API, I might be able to contribute it to Elasticsearch.
@@ -538,6 +538,7 @@ export function parseDataStreamElasticsearchEntry( | |||
parsedElasticsearchEntry.privileges = expandedElasticsearch.privileges; | |||
} | |||
|
|||
// we parse the settings from the data stream, what abou tthe values from the index template? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To decide what to show to the user, we parse the value of the current data stream setting. But what if through a package upgrade the value of the template has been adjusted but the data stream still has the old value. This partially goes back to the above discussion around triggering a rollover when this happens which would resolve this mostly.
The second missing part also what we touched on above: There is no way currently to tell which template applies to a data stream. We know it today because of the convention but if we would have templates per namespace too, it gets trickier. An ES API for it would help here.
The more I think about all these scenarios the more I think all this handling should only be temporary in Fleet itself but be done in Elasticsearch itself. Tell Elasticsearch to change a setting for a data stream and it will do all the updates and rollovers needed ...
// TODO: Is this the place that index_mode should be added as it belongs into template.settings.index.mode | ||
// On package upgrade, what happens if the previous package version had index_mode: time_series and now it is | ||
// switched back? This would work for the template but not when the data stream is rolled over | ||
// Ideally TSDB can be disabled again but is not possible today -> validation on the package side needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked to the Elasticsearch team about the issue that tsdb currently can't be disabled. elastic/elasticsearch#92544 was opened to make it possible to turn it off again. If this lands, this problem should go away.
@@ -281,6 +285,7 @@ export function buildComponentTemplates(params: { | |||
(dynampingTemplate) => Object.keys(dynampingTemplate)[0] | |||
); | |||
|
|||
// In case of index_mode: time_series this should be set to true by default except it is overwritten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++. There should be a single place to look up and change all this logic. Similar to a state machine.
The draft PR has several purpose
This PR is not expected to be merged but to have conversations.