Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow resolution of Data View without resolving all fields #139340

Closed
Tracked by #166211
miltonhultgren opened this issue Aug 24, 2022 · 33 comments
Closed
Tracked by #166211

Allow resolution of Data View without resolving all fields #139340

miltonhultgren opened this issue Aug 24, 2022 · 33 comments
Labels
Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:Logs UI Logs UI feature Feature:Metrics UI Metrics UI feature impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:needs-research This issue requires some research before it can be worked on or estimated Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team Team:obs-ux-logs Observability Logs User Experience Team

Comments

@miltonhultgren
Copy link
Contributor

When calling dataViewsService.get(dataViewId) the fields inside that data view are resolved at the same time, which adds a decent chunk to the time-to-resolution which also blocks rendering until that is done.
There are cases in the Logs and Metrics UI where we would prefer to defer the fields resolution to a later stage yet still integrate with the Data Views service (for example, use the index pattern and timestamp field but not offer auto completion until later).

Would it be possible to make fields resolution optional until requested?

@miltonhultgren miltonhultgren added Feature:Metrics UI Metrics UI feature Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:Logs UI Logs UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:AppServicesSv labels Aug 24, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

@mattkime
Copy link
Contributor

Would it be possible to make fields resolution optional until requested?

It is, but I'd like to have a thorough understanding before this is implemented. Generally speaking, we expect field list loading to be fast so I'm curious about the cases where this isn't true.

What is the priority on this? Is it tied to any high priority items?

@weltenwort
Copy link
Member

The _field_caps call that is performed to load the field list can take a while when using CCS. It is a common deployment topology for observability to have region-specific or team-specific monitoring/logging clusters and then combine several of those via CCS in cross-region/team clusters.

In those situations the get() becomes a bottleneck for the UI since it has to wait for tens of seconds until the _field_caps returns and the data view instance is returned.

@mattkime
Copy link
Contributor

@weltenwort Are the speed concerns still an issue with the current state of _field_caps? elastic/elasticsearch#84504

As best I know, there's no way it should be taking tens of seconds.

Taking a step back, I'm happy to provide data views with async field loading, but I want to make sure I understand our current limitations.

@weltenwort
Copy link
Member

The _field_caps request with CCS probably needs to wait for the slowest cluster. This is an example trace with just one lightly loaded remote cluster:

image

You can see that the call made while loading the data view takes 4 s with 3.3 s of that being taken up by the _field_caps call to ES.

@mattkime
Copy link
Contributor

mattkime commented Aug 30, 2022

@weltenwort Which version of the stack is that? I might be interested to hear other relevant details - what does 'lightly loaded' mean? How many fields?

I'm being stubborn about this because _field_caps being relatively fast is a core assumption. Overturning that would involve a fair amount of work and therefore diligence. Maybe these are the first steps.

Ideally we'd be seeing sub-second responses.

@mattkime
Copy link
Contributor

@dnhatn I noticed your work on benchmarks for the field caps api. Do we have a better idea of what we can expect performance wise?

@matschaffer
Copy link
Contributor

matschaffer commented Aug 31, 2022

Wondering if @pugnascotia 's ES tracing work might help confirm what we're waiting on for those 3.3s 🤔

@pugnascotia
Copy link
Contributor

It would at least give you an idea what tasks are being executed.

@weltenwort
Copy link
Member

The clusters are managed by the observability dev productivity team's tooling. These are the details I could find, where "production" is the cluster that my Kibana instance runs on and "remote" is the cluster that is accessed via CCS:

production cluster

remote cluster

  • Elasticsearch: elastic/elasticsearch@d0250dd
  • realistic observability logs + metrics ingestion for a few example apps
  • stable and healthy with 10 nodes and about 5500 indices
    image

Is there a way we can enable tracing on those clusters in a non-destructive way?

@dnhatn
Copy link
Member

dnhatn commented Aug 31, 2022

I think 3 seconds is possible if the cluster has 1000+ indices. We have another optimization in elastic/elasticsearch#86323. However, it's still un-merged. I will try to get it in this week. This optimization should reduce the latency to sub-seconds.

@weltenwort
Copy link
Member

weltenwort commented Aug 31, 2022

I think 3 seconds is possible if the cluster has 1000+ indices.

Right, this is not about a few seconds being too slow when the request hits that number of indices. It's about not being able to avoid it when loading a data view even when the component doesn't need the field list right away.

This optimization should reduce the latency to sub-seconds.

That sounds amazing, thank you.

@mattkime
Copy link
Contributor

I'm glad we had this discussion to help emphasize the importance of @dnhatn 's optimization work.

@javanna
Copy link
Member

javanna commented Sep 6, 2022

Thanks for making this connection @mattkime . Please ping us whenever you hit this kind of problems around calling field_caps (or any other API really), otherwise we don't even get to know that there are issues that you folks are looking to work around :)

@miltonhultgren
Copy link
Contributor Author

So is the conclusion that we aim to improve the performance of field resolution to be so fast that it's not an issue to resolve them even if they're not needed at all times?
And we expect that even for CCS use cases this will still be fast enough to not block rendering noticeably?

@mattkime
Copy link
Contributor

mattkime commented Sep 8, 2022

@miltonhultgren Yes, although these aren't necessarily mutually exclusive paths. What is the use case for loading a data view without the fields? I'd like to get into the details of what you're doing since I'll often learn something useful. Yes, I understand that initially you just need the index pattern and timestamp field but I'd still like to learn more.

It looks like the case that might have taken 3s will now take about 0.3s. Is 0.3s meaningful in this case? I'm unaware of time to load being optimized to this degree elsewhere.

All the data view code assumes the field list exists once a DataView instance has been initiated. This would be a significant change. If we were to rewrite the data views code, I'd definitely defer loading the field list. I'm trying to figure out the priority of making this change.

@miltonhultgren
Copy link
Contributor Author

What is the use case for loading a data view without the fields?

We have two use cases today, one in Logs and one for a Lens table to shows host metrics. In Logs, we use the data view to resolve which indices to load logs from and we use the timestamp field as a tie breaker for sorting (I think). In the Logs case we do want the fields but at a later time, to suggest fields for filters or change which fields to show from the log document but this doesn't need to block the initial page load.

For the new Lens table, we don't need the fields at all since we simply want to load the right metrics from the right index and no auto completion needs to happen for that table (though, later it might be filtered through unified search).

In the rest of the Metrics UI we follow a similar pattern, initially we only load the metrics from the right index and defer the field resolution until a bit later when it's needed.

So it really just boils down to wanting to defer work for later so that initial render with useful data can happen quicker.

Is 0.3s meaningful in this case?

No, I don't think so.

This would be a significant change.

Understood, I think we'd do best to wait and see how the optimization performs, specially in CCS setups with slower networks/remotes and at what percentile we might have such load times. We'll also need to gather more accurate data on this, preferably from real deployments that are properly sized for the workload (the Edge cluster isn't).

@mattkime
Copy link
Contributor

Sounds good. I'll think about how we might do this as smaller efforts instead of one big push.

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Sep 19, 2022
@smith smith removed the Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services label Sep 19, 2022
@dnhatn
Copy link
Member

dnhatn commented Nov 3, 2022

I have merged elastic/elasticsearch#86323. I think it should unblock the work here.

@kertal kertal added the bug Fixes for quality problems that affect the customer experience label Nov 18, 2022
@petrklapka petrklapka added Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. and removed Team:AppServicesSv labels Nov 28, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@kertal
Copy link
Member

kertal commented Feb 1, 2023

So the ask would be to e.g. add a param to dataViewsService.get(dataViewId), allowing to get the data fields without resolving all fields, or create a separate function like getWithoutFields, right? sounds like a feature request, more than a bug

@kertal kertal removed the bug Fixes for quality problems that affect the customer experience label Feb 1, 2023
@StephanErb
Copy link

I thought I'd share a bit of experience from the field: We have now updated our production cluster to 8.6.1 with the latest field cap improvement (elastic/elasticsearch#86323). Unfortunately performance is still not optimal for us: field_caps for metricbeat* is still in the 10-30s range. This runtime appears to be mostly dominated by frozen nodes with 800-1000 shards. The mappings are dynamic so most indices have slightly different mappings. The mapping have several thousand fields.

I also fear that the problem will get worse with TSDB and syntheric source. With the good compression ratio of TSDB combined with the planned primary shard cap at 200M documents (elastic/elasticsearch#87246) a single frozen node will be holding significantly more shards in the future. I would thus expect performance to deteriorate further.

@mattkime
Copy link
Contributor

@StephanErb

This runtime appears to be mostly dominated by frozen nodes with 800-1000 shards.

Having frozen indices within the metricbeat-* index pattern isn't something we've worked on as placing data on frozen tiers is a choice to lower cost at the expense of speed.

I think the solution should be to make sure the frozen indices are not available to the `metricbeat-* index pattern. Is this possible? Is something in the way?

@StephanErb
Copy link

StephanErb commented Feb 20, 2023

Having frozen indices within the metricbeat-* index pattern isn't something we've worked on as placing data on frozen tiers is a choice to lower cost at the expense of speed.

I would expect that querying data on frozen nodes leads to a slowdown. However, the mere presence of data on frozen outside of the queried time range should not have a performance impact. At least that my team and I have assumed so far.

We have Kubernetes and Prometheus metrics in metricbeat-*. We use an ILM policy such as after 2 days data is transitioned from hot to warm, after 7 days from warm to cold, and finally after 30 from cold to frozen. Most of our alerts and dashboards look at the last 24 hours of data. Dashboard occasionally also look at 7 and 30 days as those are default time filters in Kibana. Ranges >30 days are almost never queried.

Given that a field_caps query does not contain a time range parameter, I can somewhat see where the problem is coming from. However, as frozen nodes are not used for indexing new data, fields on them should be rather static and hopefully cachable.

@javanna
Copy link
Member

javanna commented Feb 20, 2023

Field_caps does support for providing the time_range filter, and runs the can_match phase to filter out irrelevant shards.

The mappings are dynamic so most indices have slightly different mappings. T

I am suspecting this is the main issue, as the performance improvements build up on deduplication of mappings that have same hash, which is not the case if there are slight changes between the different indices.

Field_caps performance does not have to do with the number of shards though, but rather with the number of indices having distinct mappings. Would be great to get more feedback here to see what we can improve further. Could you open an sdhe around this?

@davismcphee davismcphee added the loe:needs-research This issue requires some research before it can be worked on or estimated label Sep 8, 2023
@gbamparop gbamparop added the Team:obs-ux-logs Observability Logs User Experience Team label Nov 9, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@gbamparop gbamparop added the Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team label Nov 9, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

@kertal
Copy link
Member

kertal commented Nov 9, 2023

Yes, we intend to do this, the next step in this direction will be
#167750

@mattkime
Copy link
Contributor

mattkime commented May 3, 2024

@miltonhultgren DataViewLazy has been partially implemented. Can you look and see if its useful for your needs? Fields are only loaded as requested, potentially saving a lot of overhead compared to regular DataViews.

@miltonhultgren
Copy link
Contributor Author

I'm no longer involved in the apps where we used DataViews that lead to me opening this issue.

@weltenwort @neptunian Is this something that you guys could look at within the current logs and metrics code bases?

@weltenwort
Copy link
Member

thanks for the pointer. we have #179128 to track its usage in the log threshold alert

@kertal
Copy link
Member

kertal commented Jul 10, 2024

@mattkime I think we can close this due to #167750?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Data Views Data Views code and UI - index patterns before 8.0 Feature:Logs UI Logs UI feature Feature:Metrics UI Metrics UI feature impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. loe:needs-research This issue requires some research before it can be worked on or estimated Team:DataDiscovery Discover, search (e.g. data plugin and KQL), data views, saved searches. For ES|QL, use Team:ES|QL. Team:obs-ux-infra_services Observability Infrastructure & Services User Experience Team Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

No branches or pull requests