-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch #99031
Conversation
…elays on Elasticsearch Part of: elastic#93770
space, | ||
past_days: getPastDays(lastUpdated), | ||
}; | ||
const finder = await soClient.createPointInTimeFinder({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my question applies to all refactored telemetries (#98903, #98914, #99023) where we replaced esClient
with soClient
.
During my testing today, I found that this finder
returns only data for the default
namespace. If I create visualizations for a non-default
namespace, this will not be reflected in the telemetry.
I've tried to fix that by adding namespaces: ['*'],
but with that argument pagination doesn't work. see:
@lukeelmers @afharo could you please help me a little cause we have a recommendation to use soClient
( #88604 (comment) ) for telemetry with pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah we specifically decided against adding namespace support when using PIT until we had a concrete need for it. I guess we do now 🙂
I've created an issue to track this enhancement: #99044
@afharo WDYT, do we push forward on this work without soClient
(or with using traditional pagination which typically works with <10k results), or do we wait until such a time as we have namespace support in #99044?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote to wait, because the code with soClient
looks easier to me + we don't need to implement paging ourselves. If I'm not mistaken in esClient
we don't have a simple wrapper like createPointInTimeFinder
. + #88604 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we push forward on this work without
soClient
IMO, it feels like we keep coming back and forth with this conversation. And, usually, AFAIK, the general recommendation is to use the SO Client whenever possible because using plain ES client requests would break if/when we change the internal implementations of SOs (underlying indices, ways to store them, ...).
I'm not fully familiar with the limitations of allowing namespaces: ['*']
. I need to better understand the reasons exposed in #99044 before recommending migrating to esClient
as a workaround :)
So ++ to wait :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been playing around with this PR locally, and I think these changes would solve the issue: alexwizp#6
I tested it locally and it works as expected to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I don't know how I thought it worked when I tried... It's def not working 🤷
Sorry about that! Then, we need to rely on the outcome of #99044
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the question remains, do we adjust the scope of this PR as follows?
Size parameter was changed from 10000 to 1000- Code was refactored to use soClient instead of esClient
The paging is used through the soClient.createPointInTimeFinder method- Code was optimized, extra cycles were removed
Basically, since @alexwizp has already started the work, is it worth merging with the same 10k size, only switching to SO client instead? Or do we close this PR and consider it blocked by #99044? WDYT @alexwizp @afharo
(Either way it doesn't actually resolve #93770, it would just be a small refactor if we moved ahead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a lot of sense just to switch esClient
-> soClient
. We started that work to resolve #93770.
But on the other hand I'm ok to use soClient
with size === 10000
. it adds more contributions to my piggy bank =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add #99044 to our list of items to discuss for our next sprint. I don't think it would be a particularly large amount of work, it's mostly about deciding on a strategy with the security team & ensuring there's adequate test coverage.
@elasticmachine merge upstream |
@elasticmachine merge upstream |
merge conflict between base and head |
const visSummaries: VisSummary[] = esResponse.hits.hits.map((hit) => { | ||
const spacePhrases = hit._id.toString().split(':'); | ||
const lastUpdated: string = get(hit, '_source.updated_at'); | ||
const space = spacePhrases.length === 3 ? spacePhrases[0] : 'default'; // if in a custom space, the format of a saved object ID is space:type:id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code was removed because we can get space in more legal way from so.namespaces
Pinging @elastic/kibana-app (Team:KibanaApp) |
@afharo @stratoula, blocker issue was fixed - PR was rebased to latest master, tested locally and ready to your final review |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, tested it locally, works fine
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: cc @alexwizp |
…elays on Elasticsearch (elastic#99031) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: elastic#93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <[email protected]>
…elays on Elasticsearch (#99031) (#110447) * [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch Part of: #93770 * fix CI * fix typo * fix namespaces issue * fix tests Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Kibana Machine <[email protected]>
…eporting-to-v2 * 'master' of github.com:elastic/kibana: (120 commits) [Lens] should register "suffix" field formatter in setup lifecycle (elastic#110218) skip flaky suite (elastic#98463) skip flaky suite (elastic#108633) [Data Table] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#98903) fixes failing tests (elastic#110436) [TSVB] Remove deprecated `IFieldType` (elastic#110404) [Lens] Remove deprecated `IFieldType` (elastic#109825) [Vega] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#99023) [Visualizations] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#99031) [TSVB] Expensive queries are causing unnecessary load and delays on Elasticsearch (elastic#98914) Don't add split part of UI if we have one series (elastic#109483) [Discover] Migrate angular routing to react router (elastic#107042) [Security Solution][Endpoint][Event Filters] Fixes missing spacers between event filters cards (elastic#110282) [ML] Data Grid: Fix alignment of sorting arrow when histogram charts are enabled (elastic#110053) [canvas] Fix image argument form issues (elastic#109767) Fix asset in Pitch template (elastic#109742) chore(NA): moving @kbn/securitysolution-list-api to babel transpiler (elastic#110265) chore(NA): moving @kbn/securitysolution-list-constants to babel transpiler (elastic#110269) [Fleet] Fix upgrade link in Fleet policy table (elastic#110228) [ML] APM Latency Correlations: Fix empty state (elastic#109813) ... # Conflicts: # src/plugins/data/common/query/timefilter/types.ts
…int in Time) and restructuring of folders (#124912) ## Summary Changes the usage collector telemetry within security solutions to use PIT (Point in Time) and a few other bug fixes and restructuring. * The main goal is to change the full queries for up to 10k items to be instead using 1k batched items at a time and PIT (Point in Time). See [this ticket](#93770) for more information and [here](#99031) for an example where they changed there code to use 1k batched items. I use PIT with SO object API, searches, and then composite aggregations which all support the PIT. The PIT timeouts are all set to 5 minutes and all the maximums of 10k to not increase memory more is still there. However, we should be able to increase the 10k limit at this point if we wanted to for usage collector to count beyond the 10k. The initial 10k was an elastic limitation that PIT now avoids. * This also fixes a bug where the aggregations were only returning the top 10 items instead of the full 10k. That is changed to use [composite aggregations](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html). * This restructuring the folder structure to try and do [reductionism](https://en.wikipedia.org/wiki/Reductionism#In_computer_science) best we can. I could not do reductionism with the schema as the tooling does not allow it. But the rest is self-repeating in the way hopefully developers expect it to be. And also make it easier for developers to add new telemetry usage collector counters in the same fashion. * This exchanges the hand spun TypeScript types in favor of using the `caseComments` and the `Sanitized Alerts` and the `ML job types` using Partial and other TypeScript tricks. * This removes the [Cyclomatic Complexity](https://en.wikipedia.org/wiki/Cyclomatic_complexity) warnings coming from the linters by breaking down the functions into smaller units. * This removes the "as casts" in all but 1 area which can lead to subtle TypeScript problems. * This pushes down the logger and uses the logger to report errors and some debug information ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Part of: #93770
This PR is a part of #93770 for
src/plugins/visualizations/server/usage_collector/get_usage_collector.ts
What was done:
Size
parameter was changed from10000
to1000
soClient
instead ofesClient
soClient.createPointInTimeFinder
method