-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Endpoint] Potential improvements in endpoint user artifact packager #160387
[Security Solution][Endpoint] Potential improvements in endpoint user artifact packager #160387
Conversation
…es its usage from the task
… for each artifact type. Also generates artifacts by policy in parallel
…user_artifact_packager-6597
…user_artifact_packager-6597
…rtifacts function
// Expect 2 exceptions, the first two calls returned the same exception list items | ||
expect(resp.entries.length).toEqual(2); | ||
// Expect 1 exceptions, the first two calls returned the same exception list items | ||
expect(translated.entries.length).toEqual(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is due to a bug fix (I hope) from the old version.
Before changes, it was doing deduplication on each exceptions request call, now it is done after all exceptions have been requested, so the deduplication is over all the exceptions, not on each pagination. Let me know if this was done on this way before for any specific reason I missed.
osOptions: BuildArtifactsForOsOptions | ||
) { | ||
const policySpecificArtifacts: Record<string, InternalArtifactCompleteSchema[]> = {}; | ||
await pMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this could be done in a single for
loop. My concern here is that 2 concurrent buildArtifactsForOs
writes to the cache at the same time. It shouldn't happen since the cache is filled before when doing the global artifacts. But let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow, but is this still a concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it shouldn't be as the first one filling this cache will be exceptions for global artifacts, and those runs sequentially. So when this (by policy) is running, the cache is already populated.
See the globals in lines: 262, 284, 312, 338 and 367.
Let me know if I'm following this correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quite comprehensive refactor, this is looking awesome! I left some suggestions
also tested the caching, with your modification there is only 15 fetches instead of 27 - I don't know why 27, but that's the measurement 🤷 💯
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Outdated
Show resolved
Hide resolved
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Show resolved
Hide resolved
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Show resolved
Hide resolved
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the modifications! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my prior set of comments.
I reveiwed this PR again and left some more comments - only one is of real concern to me (call to .bulkUpdate()
does not pass esClient
throught). Let me know if you have any questions
const res = await withPackageSpan(`Bulk delete fleet artifacts`, () => | ||
esClient.bulk({ | ||
body, | ||
refresh: 'wait_for', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks.
|
||
paging = (page - 1) * 20 + items.length < total; | ||
paging = (page - 1) * 1000 + items.length < total; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes me confusion (it may just be me ). You are assuming that the page size that pageSupplier()
will use is 1000
, but really, you can't be sure of that because one could alter the callbasck given to this function. That makes this a bit fragile because someone could (unknownly) change how pageSupplier()
works and not realize that this is assuming a page size of 1000
here - thus possibly causing items to be dropped or duplicated.
Suggestion: change pageSupplier()
to take in as arguments the page
and the perPage
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Show resolved
Hide resolved
for (const currentBatch of updateBatches) { | ||
const response = await this.packagePolicyService.bulkUpdate( | ||
this.savedObjectsClient, | ||
// @ts-expect-error TS2345 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can get around this? Do we have an (internal) ESclient we can provide to the service method?
not doing so, when it is a required dependency of this service method, will likely leave us exposed to bugs in the future and possibly make fleet unstable. I checked the code in fleet for this method and it passes the esClient
on to other services, the most important one (i think) is the one that bumps the revision number on the Agent policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revisit this and see if we can use any esClient from our service here. Note that this was done in this way before with the .update
, but good point on ensuring this works to avoid future bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes.
🚢 it
@@ -82,3 +82,7 @@ export const uniqueIdFromArtifact = < | |||
}: T): string => { | |||
return `${packageName}:${identifier}-${decodedSha256}`; | |||
}; | |||
|
|||
export const uniqueIdFromId = (id: string, packageName: string): string => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: uniqueIdFromId
is a bit confusing, is id the artifact id? We could rename the function to uniqueIdFromArtifactId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id of the artifact without the packageName, so it adds the packageName to the current id as an ES id: ${packageName}:${artifactId}
.
Is the same thing uniqueIdFromArtifact
does, but as we only have the id here I thought would be better to just create this new method instead of load the artifact to generate the unique Id with the existing method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also rename this to packagePrefixedId
. That might be less confusing to read. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fleet changes LGTM
@juliaElastic what do you think about this comment in the description related to the new bulkDelete:
Is it ok not loading the artifact first and checking the package name as it is done with the single delete artifact method? |
I'm not that familiar with the artifacts API to answer this. It sounds as it should be fine to delete without loading. What happens if the artifact doesn't exist, do we catch the delete failures? |
@juliaElastic Yes, we capture all errors here: https://github.com/elastic/kibana/pull/160387/files#diff-231337d0ab0963e2654e75eecdbdabf8a771c3e3be716a5a93333bf972a4f272R222 |
@elasticmachine merge upstream |
…user_artifact_packager-6597
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Unknown metric groupsAPI count
ESLint disabled line counts
References to deprecated APIs
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! 🎉 I've a few nits and questions but nothing that's should block to 🚢
|
||
const res = await withPackageSpan(`Bulk delete fleet artifacts`, () => | ||
esClient.bulk({ | ||
body, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: body
is deprecated over operations
for bulk
. You can rename body
to operations
. You can change the other bulk
calls in the file as well.
Similarly, the create
method (used in createArtifact
) prefers document
instead of a body
in the request, and the search
(used in listArtifacts
) method accepts sort
directly in the request instead of being nested inside a body
.
@@ -82,3 +82,7 @@ export const uniqueIdFromArtifact = < | |||
}: T): string => { | |||
return `${packageName}:${identifier}-${decodedSha256}`; | |||
}; | |||
|
|||
export const uniqueIdFromId = (id: string, packageName: string): string => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also rename this to packagePrefixedId
. That might be less confusing to read. 😅
|
||
paging = (page - 1) * 100 + response.data.length < response.total; | ||
paging = (page - 1) * 1000 + response.data.length < response.total; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could declare this size as a const as it is the same as perPage
value.
perPage: 20, | ||
kuery: 'ingest-package-policies.package.name:endpoint', | ||
}); | ||
private async listEndpointPolicyIds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider adding a return type to the function.
@@ -583,20 +664,29 @@ export class ManifestManager { | |||
this.logger.info(`Committed manifest ${manifest.getSemanticVersion()}`); | |||
} | |||
|
|||
private async listEndpointPolicies(page: number) { | |||
private async listEndpointPolicies(page: number, perPage: number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider adding a return type here.
os: string; | ||
policyId?: string; | ||
schemaVersion: string; | ||
}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a return type
currentBatch | ||
); | ||
|
||
// Parse errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // Update errors
...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts
Show resolved
Hide resolved
allPolicyIds: string[], | ||
supportedOSs: string[], | ||
osOptions: BuildArtifactsForOsOptions | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a return type
osOptions: BuildArtifactsForOsOptions | ||
) { | ||
const policySpecificArtifacts: Record<string, InternalArtifactCompleteSchema[]> = {}; | ||
await pMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow, but is this still a concern?
@ashokaditya Thanks for the review and your suggestions, if it's ok to you, I will add all nits in a follow up pr I'm working on. |
## Summary - Reduces fleet artifact get calls by getting them all at one time instead of requesting it one by one using id's - Updates unit test. - Adds duration time log to endpoint packager task. - Adds missing return types to functions. Follow up of: #160387
#163509) ## Summary Removes pMap and uses a for loop instead when fetching exceptions (as it was done before) since exceptions are cached and there is no need for parallel code here. Original PR with previous changes: #160387 Co-authored-by: Kibana Machine <[email protected]>
Summary
This pr includes different changes to:
.kibana
and.fleet-artifacts
indicesendpoint:user-artifact-packager
task.Actions taken:
Do package policy updates in parallel using
pMap
. This is the slowest phase of the task as a package policy update takes a few seconds to be done. Doing it in parallel (with concurrency = 10) speeds up this phase, specially when there are a lot of policies. The concurrency value could be replaced by a config parameter as a safe option to decrease or increase the concurrency.Introduces new fleet artifacts client functionality for
bulk
delete. It is used in thedeleteArtifacts
function in manifest manager but also in thecleanup
function. Artifacts deletion are done always there is an artifact update as artifacts are not updated, instead, it does create + delete. The single delete artifact does a previous artifact verification by getting it by id and ensuring it is from the current package. This is not done now in the new bulk operation, but let me know if this is strongly needed. This was also increasing the number of calls we were doing into.fleet-artifacts
index as an update/create/delete of a global exception might impact all the artifacts, so all artifacts being created and deleted.Package policy ids where loaded (using a pagesize of 20) on each artifact generation (so for each policy and OS). This was causing a lot of
.kibana
index requests, specially for the cases with a lot of policies. Now all the policy ids are loaded at the beginning (with a 100 page size) and used, so a few.kibana
index queries done for all the artifacts, policyIds an OS's.Exceptions (Event Filters, Trusted Apps, endpoint exceptions, etc) were queried for each OS and policyId, this also ended up doing a lot of requests in
.kibana
index. Now it does a query by ListId + OS one time and stores the result in a cache which is reused for each policy. Now we do less queries and the performance has improved.All unit and integration tests have been updated according to the changes.
The behaviour has changed with all these changes, but the end result should be the same.
TBD: