Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Investigate using transforms for APM UI data #74498

Closed
dgieselaar opened this issue Aug 6, 2020 · 10 comments
Closed

[APM] Investigate using transforms for APM UI data #74498

dgieselaar opened this issue Aug 6, 2020 · 10 comments
Labels
enhancement New value added to drive a business result Team:APM All issues that need APM UI Team support

Comments

@dgieselaar
Copy link
Member

dgieselaar commented Aug 6, 2020

We are currently experimenting with APM Server creating transaction duration metrics from transaction events. See elastic/apm#104 for more details.

These metric documents are more efficient in terms of storage, and in some cases, will speed up our APM UI requests as well. However, they serve a broad purpose (almost every use case in APM UI), and if we create more specific metrics for certain views, we can make bigger gains. Using one example (based on real-world data), the generic transaction duration metrics aggregation creates about 1 metric document for each 7 transaction events. Metrics that only support the service overview would create 1 metric document for 6000 transaction events. Elasticsearch supports transforms, which could allow us to easily create these metrics.

Possible benefits are:

  • Storage cost: these metrics are extremely efficient in terms of storage because they only store the data we need in the UI. They could be retained for longer, at a lower cost.
  • Performance improvements: Because these metrics generate significantly less documents, searches and aggregations should be significantly faster.
  • Backwards compatible: if we create a new metric or change an existing one, simply re-installing the transform should give us metrics for historical data as well (depending on its availability).
  • Easier integration with other Kibana apps: some of our charts require post-processing or complicated queries, which makes it hard to visualise in other Kibana apps. Pre-aggregating this data could make this more straight-forward. We could also more easily leverage things like search strategies and embeddables.

Here's a rough idea for what questions a POC should aim to answer:

  • Which visualisations in our UI would benefit from transforms? We can select two or three for this POC.
  • What pieces are missing? E.g., ES transforms don't support creating HDR histograms. Also, the kibana_system user doesn't have the appropriate permissions to manage the transforms or indices. What else?
  • What's the cost of trying to support multiple layers of data? Ideally we would show UI metrics first, then allow the user to drill-down into higher-fidelity data (for instance, when they use the query bar). Is that do-able?
  • Can we more easily create dashboards or leverage concepts like Kibana's search strategies?
  • What's the performance gain and the storage savings?
  • What role should rollups play?
  • If we install a transform, can we configure it so that newer data is processed first? This would mean that the user doesn't have to wait until the transform is caught up before using the UI.

Some possible UI metrics we can create:

  • Service overview metrics
  • Derived service annotations
  • Transaction breakdown data
  • Garbage collection metrics
  • A list of services (to be used in various configuration wizards)
@dgieselaar dgieselaar added Team:APM All issues that need APM UI Team support enhancement New value added to drive a business result v7.11.0 labels Aug 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@cauemarcondes cauemarcondes changed the title [APM] Investigate using transforms for APM UI data [POC][APM] Investigate using transforms for APM UI data Aug 10, 2020
@felixbarny
Copy link
Member

An alternative that could be considered is that APM Server "collapses" some dimensions that are known to be higher in cardinality, that are not needed for a lot of aggregations.

Looking at https://github.com/elastic/apm-server/blob/5cb4101d705effdf2f54e2e45847ee92033e806e/x-pack/apm-server/aggregation/txmetrics/aggregator.go#L414, the service map or the service landing page probably only need the service.name and service.environment dimensions.

The server could collect aggregate metrics that set a special value, for example, service.name: _all to group the metrics of all services together. That way there's just one time series to look at for the UI vs having to aggregate that on the fly.

It should be fairly simple to do that server-side, wouldn't require significantly more memory, and the metrics would be instantly available, without a delay.

I used this technique in my previous project to significantly speed up aggregate graphs.

@dgieselaar
Copy link
Member Author

@felixbarny Do you mean that APM Server would create the specific metrics this POC aims to create via transforms? e.g., a service overview metric that doesn't record transaction.name, transaction.type etc so it's more efficient?

FWIW, I don't think APM Server is ideal here. The fact that aggregation happens per-instance means that the efficiency of recorded metrics will always be more limited than using ES for those aggregations.

@felixbarny
Copy link
Member

Do you mean that APM Server would create the specific metrics this POC aims to create via transforms?

Yes, that's what I meant.

The fact that aggregation happens per-instance means that the efficiency of recorded metrics will always be more limited than using ES for those aggregations.

Excellent point. It's probably not a big issue when you just have a couple of central APM Servers but with the server-per-host model, we're heading towards it is an issue.

@sophiec20
Copy link
Contributor

ping @elastic/ml-core for visibility

@dgieselaar dgieselaar changed the title [POC][APM] Investigate using transforms for APM UI data [APM] Investigate using transforms for APM UI data Oct 14, 2020
@sorenlouv sorenlouv added v7.13.0 and removed v7.12.0 labels Jan 12, 2021
@sorenlouv
Copy link
Member

@dgieselaar What do you think about tackling this for 7.13?

@dgieselaar
Copy link
Member Author

@sqren sounds good to me!

@botelastic
Copy link

botelastic bot commented Feb 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added stale Used to mark issues that were closed for being stale and removed stale Used to mark issues that were closed for being stale labels Feb 9, 2022
@felixbarny
Copy link
Member

Entity extraction is one of the top asks that we have for the platform team.

They want to know what's missing in transforms in order for use to be able to use them. I've summarized some raw feedback in a dedicates section of the Stream processing use cases document.

Note that it seems like the security team has been able to successfully adopt transforms. See https://github.com/elastic/security-team/issues/157.

@dgieselaar
Copy link
Member Author

Not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:APM All issues that need APM UI Team support
Projects
None yet
Development

No branches or pull requests

6 participants