Allow resolution of Data View without resolving all fields #139340

miltonhultgren · 2022-08-24T08:39:51Z

When calling dataViewsService.get(dataViewId) the fields inside that data view are resolved at the same time, which adds a decent chunk to the time-to-resolution which also blocks rendering until that is done.
There are cases in the Logs and Metrics UI where we would prefer to defer the fields resolution to a later stage yet still integrate with the Data Views service (for example, use the index pattern and timestamp field but not offer auto completion until later).

Would it be possible to make fields resolution optional until requested?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-08-24T08:39:53Z

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

elasticmachine · 2022-08-24T08:39:53Z

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

mattkime · 2022-08-29T17:08:57Z

Would it be possible to make fields resolution optional until requested?

It is, but I'd like to have a thorough understanding before this is implemented. Generally speaking, we expect field list loading to be fast so I'm curious about the cases where this isn't true.

What is the priority on this? Is it tied to any high priority items?

weltenwort · 2022-08-30T10:59:58Z

The _field_caps call that is performed to load the field list can take a while when using CCS. It is a common deployment topology for observability to have region-specific or team-specific monitoring/logging clusters and then combine several of those via CCS in cross-region/team clusters.

In those situations the get() becomes a bottleneck for the UI since it has to wait for tens of seconds until the _field_caps returns and the data view instance is returned.

mattkime · 2022-08-30T14:55:19Z

@weltenwort Are the speed concerns still an issue with the current state of _field_caps? elastic/elasticsearch#84504

As best I know, there's no way it should be taking tens of seconds.

Taking a step back, I'm happy to provide data views with async field loading, but I want to make sure I understand our current limitations.

weltenwort · 2022-08-30T17:45:01Z

The _field_caps request with CCS probably needs to wait for the slowest cluster. This is an example trace with just one lightly loaded remote cluster:

You can see that the call made while loading the data view takes 4 s with 3.3 s of that being taken up by the _field_caps call to ES.

mattkime · 2022-08-30T18:21:42Z

@weltenwort Which version of the stack is that? I might be interested to hear other relevant details - what does 'lightly loaded' mean? How many fields?

I'm being stubborn about this because _field_caps being relatively fast is a core assumption. Overturning that would involve a fair amount of work and therefore diligence. Maybe these are the first steps.

Ideally we'd be seeing sub-second responses.

mattkime · 2022-08-30T19:53:28Z

@dnhatn I noticed your work on benchmarks for the field caps api. Do we have a better idea of what we can expect performance wise?

matschaffer · 2022-08-31T01:42:42Z

Wondering if @pugnascotia 's ES tracing work might help confirm what we're waiting on for those 3.3s 🤔

pugnascotia · 2022-08-31T08:04:14Z

It would at least give you an idea what tasks are being executed.

weltenwort · 2022-08-31T09:37:11Z

The clusters are managed by the observability dev productivity team's tooling. These are the details I could find, where "production" is the cluster that my Kibana instance runs on and "remote" is the cluster that is accessed via CCS:

production cluster

Elasticsearch: elastic/elasticsearch@d0250dd
Kibana: 7919682c075 (in dev mode)
idle with 6 nodes and 14 indices:
- only my production Kibana instance
- no alerts
- no other users
- no data ingestion

remote cluster

Elasticsearch: elastic/elasticsearch@d0250dd
realistic observability logs + metrics ingestion for a few example apps
stable and healthy with 10 nodes and about 5500 indices

Is there a way we can enable tracing on those clusters in a non-destructive way?

dnhatn · 2022-08-31T13:42:13Z

I think 3 seconds is possible if the cluster has 1000+ indices. We have another optimization in elastic/elasticsearch#86323. However, it's still un-merged. I will try to get it in this week. This optimization should reduce the latency to sub-seconds.

weltenwort · 2022-08-31T17:45:16Z

I think 3 seconds is possible if the cluster has 1000+ indices.

Right, this is not about a few seconds being too slow when the request hits that number of indices. It's about not being able to avoid it when loading a data view even when the component doesn't need the field list right away.

This optimization should reduce the latency to sub-seconds.

That sounds amazing, thank you.

mattkime · 2022-08-31T20:25:41Z

I'm glad we had this discussion to help emphasize the importance of @dnhatn 's optimization work.

javanna · 2022-09-06T20:18:53Z

Thanks for making this connection @mattkime . Please ping us whenever you hit this kind of problems around calling field_caps (or any other API really), otherwise we don't even get to know that there are issues that you folks are looking to work around :)

miltonhultgren · 2022-09-08T08:08:42Z

So is the conclusion that we aim to improve the performance of field resolution to be so fast that it's not an issue to resolve them even if they're not needed at all times?
And we expect that even for CCS use cases this will still be fast enough to not block rendering noticeably?

mattkime · 2022-09-08T18:52:43Z

@miltonhultgren Yes, although these aren't necessarily mutually exclusive paths. What is the use case for loading a data view without the fields? I'd like to get into the details of what you're doing since I'll often learn something useful. Yes, I understand that initially you just need the index pattern and timestamp field but I'd still like to learn more.

It looks like the case that might have taken 3s will now take about 0.3s. Is 0.3s meaningful in this case? I'm unaware of time to load being optimized to this degree elsewhere.

All the data view code assumes the field list exists once a DataView instance has been initiated. This would be a significant change. If we were to rewrite the data views code, I'd definitely defer loading the field list. I'm trying to figure out the priority of making this change.

miltonhultgren · 2022-09-13T08:00:46Z

What is the use case for loading a data view without the fields?

We have two use cases today, one in Logs and one for a Lens table to shows host metrics. In Logs, we use the data view to resolve which indices to load logs from and we use the timestamp field as a tie breaker for sorting (I think). In the Logs case we do want the fields but at a later time, to suggest fields for filters or change which fields to show from the log document but this doesn't need to block the initial page load.

For the new Lens table, we don't need the fields at all since we simply want to load the right metrics from the right index and no auto completion needs to happen for that table (though, later it might be filtered through unified search).

In the rest of the Metrics UI we follow a similar pattern, initially we only load the metrics from the right index and defer the field resolution until a bit later when it's needed.

So it really just boils down to wanting to defer work for later so that initial render with useful data can happen quicker.

Is 0.3s meaningful in this case?

No, I don't think so.

This would be a significant change.

Understood, I think we'd do best to wait and see how the optimization performs, specially in CCS setups with slower networks/remotes and at what percentile we might have such load times. We'll also need to gather more accurate data on this, preferably from real deployments that are properly sized for the workload (the Edge cluster isn't).

mattkime · 2022-09-13T14:11:11Z

Sounds good. I'll think about how we might do this as smaller efforts instead of one big push.

dnhatn · 2022-11-03T15:15:08Z

I have merged elastic/elasticsearch#86323. I think it should unblock the work here.

elasticmachine · 2022-11-28T21:26:36Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

kertal · 2023-02-01T20:50:10Z

So the ask would be to e.g. add a param to dataViewsService.get(dataViewId), allowing to get the data fields without resolving all fields, or create a separate function like getWithoutFields, right? sounds like a feature request, more than a bug

StephanErb · 2023-02-18T11:43:35Z

I thought I'd share a bit of experience from the field: We have now updated our production cluster to 8.6.1 with the latest field cap improvement (elastic/elasticsearch#86323). Unfortunately performance is still not optimal for us: field_caps for metricbeat* is still in the 10-30s range. This runtime appears to be mostly dominated by frozen nodes with 800-1000 shards. The mappings are dynamic so most indices have slightly different mappings. The mapping have several thousand fields.

I also fear that the problem will get worse with TSDB and syntheric source. With the good compression ratio of TSDB combined with the planned primary shard cap at 200M documents (elastic/elasticsearch#87246) a single frozen node will be holding significantly more shards in the future. I would thus expect performance to deteriorate further.

mattkime · 2023-02-20T05:25:27Z

@StephanErb

This runtime appears to be mostly dominated by frozen nodes with 800-1000 shards.

Having frozen indices within the metricbeat-* index pattern isn't something we've worked on as placing data on frozen tiers is a choice to lower cost at the expense of speed.

I think the solution should be to make sure the frozen indices are not available to the `metricbeat-* index pattern. Is this possible? Is something in the way?

StephanErb · 2023-02-20T07:55:26Z

Having frozen indices within the metricbeat-* index pattern isn't something we've worked on as placing data on frozen tiers is a choice to lower cost at the expense of speed.

I would expect that querying data on frozen nodes leads to a slowdown. However, the mere presence of data on frozen outside of the queried time range should not have a performance impact. At least that my team and I have assumed so far.

We have Kubernetes and Prometheus metrics in metricbeat-*. We use an ILM policy such as after 2 days data is transitioned from hot to warm, after 7 days from warm to cold, and finally after 30 from cold to frozen. Most of our alerts and dashboards look at the last 24 hours of data. Dashboard occasionally also look at 7 and 30 days as those are default time filters in Kibana. Ranges >30 days are almost never queried.

Given that a field_caps query does not contain a time range parameter, I can somewhat see where the problem is coming from. However, as frozen nodes are not used for indexing new data, fields on them should be rather static and hopefully cachable.

javanna · 2023-02-20T16:27:21Z

Field_caps does support for providing the time_range filter, and runs the can_match phase to filter out irrelevant shards.

The mappings are dynamic so most indices have slightly different mappings. T

I am suspecting this is the main issue, as the performance improvements build up on deduplication of mappings that have same hash, which is not the case if there are slight changes between the different indices.

Field_caps performance does not have to do with the number of shards though, but rather with the number of indices having distinct mappings. Would be great to get more feedback here to see what we can improve further. Could you open an sdhe around this?

elasticmachine · 2023-11-09T12:20:57Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

elasticmachine · 2023-11-09T12:43:17Z

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

kertal · 2023-11-09T13:37:41Z

Yes, we intend to do this, the next step in this direction will be
#167750

mattkime · 2024-05-03T19:31:50Z

@miltonhultgren DataViewLazy has been partially implemented. Can you look and see if its useful for your needs? Fields are only loaded as requested, potentially saving a lot of overhead compared to regular DataViews.

miltonhultgren · 2024-05-03T19:43:26Z

I'm no longer involved in the apps where we used DataViews that lead to me opening this issue.

@weltenwort @neptunian Is this something that you guys could look at within the current logs and metrics code bases?

weltenwort · 2024-05-15T10:55:46Z

thanks for the pointer. we have #179128 to track its usage in the log threshold alert

kertal · 2024-07-10T14:57:28Z

@mattkime I think we can close this due to #167750?

This was referenced Aug 24, 2022

[Infra UI] Make call to _field_caps non-blocking for UI rendering #127616

Closed

[Infra UI] Defer field resolution in Data View powered views #139341

Closed

miltonhultgren mentioned this issue Sep 13, 2022

[Metrics UI] Separate field loading from source loading #139351

Closed

smith removed the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Sep 19, 2022

kertal added the bug Fixes for quality problems that affect the customer experience label Nov 18, 2022

petrklapka added Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 28, 2022

kertal removed the bug Fixes for quality problems that affect the customer experience label Feb 1, 2023

weltenwort mentioned this issue Feb 7, 2023

[AO] Improve loading state in Overview page #150327

Merged

kertal mentioned this issue Jun 20, 2023

[data views] async and partial loading of field list #152159

Closed

davismcphee added the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 8, 2023

This was referenced Sep 21, 2023

[META] Dashboard Performance #166211

Closed

[Dashboards] Enable easy filtering by data tier #166948

Open

gbamparop added the Team:obs-ux-logs Observability Logs User Experience Team label Nov 9, 2023

gbamparop added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Nov 9, 2023

mattkime closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow resolution of Data View without resolving all fields #139340

Allow resolution of Data View without resolving all fields #139340

miltonhultgren commented Aug 24, 2022

elasticmachine commented Aug 24, 2022

elasticmachine commented Aug 24, 2022

mattkime commented Aug 29, 2022

weltenwort commented Aug 30, 2022

mattkime commented Aug 30, 2022

weltenwort commented Aug 30, 2022

mattkime commented Aug 30, 2022 •

edited

Loading

mattkime commented Aug 30, 2022

matschaffer commented Aug 31, 2022 •

edited

Loading

pugnascotia commented Aug 31, 2022

weltenwort commented Aug 31, 2022

dnhatn commented Aug 31, 2022

weltenwort commented Aug 31, 2022 •

edited

Loading

mattkime commented Aug 31, 2022

javanna commented Sep 6, 2022

miltonhultgren commented Sep 8, 2022

mattkime commented Sep 8, 2022 •

edited

Loading

miltonhultgren commented Sep 13, 2022

mattkime commented Sep 13, 2022

dnhatn commented Nov 3, 2022

elasticmachine commented Nov 28, 2022

kertal commented Feb 1, 2023

StephanErb commented Feb 18, 2023

mattkime commented Feb 20, 2023

StephanErb commented Feb 20, 2023 •

edited

Loading

javanna commented Feb 20, 2023

elasticmachine commented Nov 9, 2023

elasticmachine commented Nov 9, 2023

kertal commented Nov 9, 2023

mattkime commented May 3, 2024

miltonhultgren commented May 3, 2024

weltenwort commented May 15, 2024

kertal commented Jul 10, 2024

Allow resolution of Data View without resolving all fields #139340

Allow resolution of Data View without resolving all fields #139340

Comments

miltonhultgren commented Aug 24, 2022

elasticmachine commented Aug 24, 2022

elasticmachine commented Aug 24, 2022

mattkime commented Aug 29, 2022

weltenwort commented Aug 30, 2022

mattkime commented Aug 30, 2022

weltenwort commented Aug 30, 2022

mattkime commented Aug 30, 2022 • edited Loading

mattkime commented Aug 30, 2022

matschaffer commented Aug 31, 2022 • edited Loading

pugnascotia commented Aug 31, 2022

weltenwort commented Aug 31, 2022

production cluster

remote cluster

dnhatn commented Aug 31, 2022

weltenwort commented Aug 31, 2022 • edited Loading

mattkime commented Aug 31, 2022

javanna commented Sep 6, 2022

miltonhultgren commented Sep 8, 2022

mattkime commented Sep 8, 2022 • edited Loading

miltonhultgren commented Sep 13, 2022

mattkime commented Sep 13, 2022

dnhatn commented Nov 3, 2022

elasticmachine commented Nov 28, 2022

kertal commented Feb 1, 2023

StephanErb commented Feb 18, 2023

mattkime commented Feb 20, 2023

StephanErb commented Feb 20, 2023 • edited Loading

javanna commented Feb 20, 2023

elasticmachine commented Nov 9, 2023

elasticmachine commented Nov 9, 2023

kertal commented Nov 9, 2023

mattkime commented May 3, 2024

miltonhultgren commented May 3, 2024

weltenwort commented May 15, 2024

kertal commented Jul 10, 2024

mattkime commented Aug 30, 2022 •

edited

Loading

matschaffer commented Aug 31, 2022 •

edited

Loading

weltenwort commented Aug 31, 2022 •

edited

Loading

mattkime commented Sep 8, 2022 •

edited

Loading

StephanErb commented Feb 20, 2023 •

edited

Loading