-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Optimize package installation performance, phase 1 #130906
Conversation
// Datastream now throw an error if the aliases field is present so ensure that we remove that field. | ||
const getTemplateRes = await retryTransientEsErrors( | ||
() => | ||
esClient.indices.getIndexTemplate( | ||
{ | ||
name: templateName, | ||
}, | ||
{ | ||
ignore: [404], | ||
} | ||
), | ||
{ logger } | ||
); | ||
|
||
const existingIndexTemplate = getTemplateRes?.index_templates?.[0]; | ||
if ( | ||
existingIndexTemplate && | ||
existingIndexTemplate.name === templateName && | ||
existingIndexTemplate?.index_template?.template?.aliases | ||
) { | ||
const updateIndexTemplateParams = { | ||
name: templateName, | ||
body: { | ||
...existingIndexTemplate.index_template, | ||
template: { | ||
...existingIndexTemplate.index_template.template, | ||
// Remove the aliases field | ||
aliases: undefined, | ||
}, | ||
}, | ||
}; | ||
|
||
await retryTransientEsErrors( | ||
() => esClient.indices.putIndexTemplate(updateIndexTemplateParams, { ignore: [404] }), | ||
{ logger } | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this isn't necessary. Creating an index template without the aliases
field at all seems to have the same behavior in my manual testing and this is already what the main create index template logic below does.
@nchaulet It looks like you wrote this originally, do you know if there's anything I'm missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the original issue: #90984
It appears it was from an upgrade of the endpoint package that I can no longer reproduce, even using the old versions mentioned in the issue 0.16.2 -> 0.17.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it could be reproduced if you upgraded from 7.10.0 to main, but this is not strictly supported since Fleet was not GA until 7.14.0. I think this is a safe thing to remove.
35c863f
to
ca44aeb
Compare
Pinging @elastic/fleet (Team:Fleet) |
great work, should we run the script that install all packages to verify nothing breaks? |
Good idea. I'm also going to move the APM instrumentation into a separate PR so if this needs to get reverted for any reason we still have the instrumentation in. This also makes it easier to create comparison traces before/after the optimizations. |
dc11c73
to
ea84f45
Compare
There's some work that will need to be done to refactor the ES asset reference passing to be able to remove these refreshes on the SO index. A lot of logic is depending on updating this object and then querying it back which needs to be refactored to simply keeping a running list in memory that gets updated as we go without querying it back to avoid the needs for refreshes. |
10c4fa6
to
151acb6
Compare
esReferences = await updateEsAssetReferences(savedObjectsClient, packageInfo.name, esReferences, { | ||
assetsToAdd: ilmPolicies.map((policy) => ({ | ||
type: ElasticsearchAssetType.ilmPolicy, | ||
id: policy.name, | ||
})), | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't previously keeping track of these references so these weren't getting deleted on uninstall. Fixed in this PR.
return clusterPromise; | ||
return await clusterPromise; | ||
} catch (e) { | ||
if (e?.statusCode === 400 && e.body?.error?.reason.includes('already exists')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error message that will be returned by ES if the custom component template already exists. Unfortunately there's not a better machine readable field name.
id: 'endpoint.metadata-default-0.16.0-dev.0', | ||
type: 'transform', | ||
}, | ||
{ | ||
id: 'endpoint.metadata-default-0.16.0-dev.0', | ||
id: 'endpoint.metadata_current-default-0.16.0-dev.0', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order changed here, but it doesn't really matter.
@@ -121,6 +123,41 @@ export async function installKibanaAssets(options: { | |||
|
|||
return installedAssets; | |||
} | |||
|
|||
export async function installKibanaAssetsAndReferences({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this from _install_package
to make mocking simpler
// Because Kibana assets are installed in parallel with ES assets with refresh: false, we almost always run into an | ||
// issue that causes a conflict error due to this issue: https://github.com/elastic/kibana/issues/126240. This is safe | ||
// to retry constantly until it succeeds to optimize this critical user journey path as much as possible. | ||
pRetry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this retry logic will retry failures other than conflict errors. I think that's probably ok, but happy to fix if we think it's a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a chance 20 retry takes very long in case of trying to retry a persistent error?
151acb6
to
d22c8dd
Compare
I've been able to shave off another ~3-4 seconds with two additional optimizations but I'm going to move those to a separate PR so we can focus on getting this one in first. |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @joshdover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I haven't tested locally
Summary
Phase 1 of #110500
Depends on the following to be merged first:
In conjunction with elastic/elasticsearch#86017, this PR improves package installation time significantly. I tested this with the package install script that installs all packages, without any failures.
refresh: false
for several of the Saved Object operations that we don't need to wait for@custom
templates to use thecreate=true
query parameter rather than a check and set (removes an API call)Another 1-2 seconds will be shaved off by #115032
The largest change in this PR is related to removing the
refresh
on each of the Saved Object update calls for updating theinstalled_es
array on theepm-packages
Saved Object. Prior to this PR, each update of this array involved querying the current value and then writing an update back. In order to be able to remove therefresh
, we have to avoid needing to query the current value and instead keep the current value in memory. This requires keeping track of the current array and passing it into each asset type's install function.This change also exposes a known issue with Saved Object update calls that could result in conflict errors even when not using optimistic concurrency control (see #126240). In order to work around this problem, I added simple retry logic which seems to work well.
I've deferred on a couple more optimizations to a separate PR that will shave off another few seconds:
refresh: false
on the bulk Saved Object import, blocked on Expose refresh option to SavedObjectsImporter #131339Results
To install the system package, version 1.13.0:
main
- 20smain
+ Batch execute template and pipeline cluster state operations elasticsearch#86017 - 13.4sWhen installing all packages, the 95p changes significantly:
main
- 21sView APM traces
main
main
+ elastic/elasticsearch#86017This branch + elastic/elasticsearch#86017
View install script results
This branch + elastic/elasticsearch#86017