-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics UI] Enhanced host details - Processes - Additional Features #83968
Comments
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
@elastic/apm @sqren @sorantis @nehaduggal can any of you help us understand the "View Trace in APM" requirement in this ticket? How can we most easily/safely link to APM from a running process with UPDATE: I just saw this comment -> #80307 (comment) It looks like there is still some ambiguity about where exactly to link? |
After meeting with @hbharding, @sorantis, @simianhacker, @Zacqary, @phillipb, and @katefarrar about this work, we decided to make a few performance and UX optimizations. Step 1 (these will be added to the AC for this ticket):
Step 2 (this will be moved to a new, separate ticket):
|
In that discussion we talked about linking to a service given a
If we want to link to a specific trace some additional info is needed. Obviously if you have access to a |
@jasonrhodes The sparkline provides a quick overview of recent process behavior and the plan was to standardize the use of sparklines for other table views, e.g. containers, pods, processes, and (in the future) functions. We're already using it in other places, like APM. I'd not consider removing it until we have a good understanding of the actual performance impact. |
Can we start by getting a performance baseline of the current query vs new ones?
|
@sorantis if we don't get rid of the sparklines we are sort of back at square one on what to do with this ticket. The performance complexity that the sparklines introduce is exponential because of max bucket size problems. I think choosing to standardize on those may end up creating a ton of issues across the app, if we aren't very careful. @sgrodzicki yeah we can do some query comparisons. I was hoping we could do something quick and easy now and do something more holistic after that, but it sounds like we need more information. I'll work with @Zacqary to look at those query numbers and we'll report back. |
@jasonrhodes correct, @alex-fedotyev and I discussed this and came to the conclusion that linking to a service is better than to a particular trace. |
OK so for the query performance issues, let's do what @sgrodzicki suggested before we make any changes to the UI/queries. To do this, I'd like to see a full example query for each of the following 4 scenarios attached to this ticket, so we can reference them later.
Then we can profile those queries against the dev-next cluster and see the differences in timing. That's still just one somewhat arbitrary set of data, but it's at least a start so we can understand which decision makes sense for now, as well as for moving forward. Let me know if that doesn't make sense. |
It doesn't. But it should be trivial to implement if this is what you need. |
1. Current Request (from /api/infra/metrics_api)This request is made via the new Metrics API. It is called multiple times with the
2. Current query (without data for sparklines)This query doesn't really exist but I imagine this is what it would look like. It would need to be a custom query that uses the same mechanism to retrieve all the results and sort in memory just like the Metrics API request.
3. Top N for Process with Sparklines
4. Top N for Process without Sparklines
|
Tested this on the
For this test, I reduced the These took between 5-10 seconds, though this was after running the first test, so it may still have been benefiting from whatever caching caused the sparkline-compatible query to drastically speed up after multiple refreshes Will post results of the TOP queries after testing them. |
I updated the ranges to reflect what we are doing in production. Everything with the data histogram should be the last 15 minutes and anything without should be the last 1 minute. To be fair, we should probably make 2 requests. One for the summary data for the last 1 minute and one for the sparklines which is the last 15 minutes. |
@Zacqary On the subsequent requests for |
@simianhacker I didn't, I just kept refreshing the page because I wanted to get an average, but then it turned out it sped up. This was done through the Inventory view piping it through the Metrics API instead of using the Dev Tools to make a query directly, measuring the XHR time. |
Ran this in the Dev Tools. This took about 10 seconds initially but only 100ms on subsequent requests.
Between 50-100ms in the Dev Tools, but I'm not sure if that's accurate since it's probably taking advantage of Query 3's cache. EDIT: I ran Query 4 again using a timerange from several weeks ago and it took 790ms, so that's probably a more accurate non-cached reading. |
@Zacqary I think you can break the cache by changing the host name |
@simianhacker Tried that; only playing with the date broke the cache |
Updated the AC just now to make sure we group by |
@Zacqary in case you didn't try yet, you can also use You could perhaps also use We're also using sparklines in one of our new pages. I had to separately fetch the sparklines data because I was running into the too_many_buckets exception. |
OK so 5-10 seconds for "current query as-is, without sparklines" is still pretty not great, especially compared to the Top N queries without sparklines (which I agree, should definitely be queried separately, if we're doing them). Is there a real reason we want to preserve the pagination for this UX or are we just sticking to it because we had it originally? If there isn't a great reason to preserve it, I think we should move forward with Top 10 (or Top 15) and just let clicking on the headers sort with a new query. Then we should log a separate ticket to do the sparklines query separately and add those back into the design. We can always rethink the sorting later, as well. We'll need help from @hbharding to figure out how to show people we are showing them Top 10 Memory, Top 10 CPU, but I don't think we should get too hung up on the sorting. cc @sorantis @simianhacker @sgrodzicki Thoughts? |
I'm focusing on the View Trace in APM functionality now, and I want to clarify what we're expecting. Should this button still say "Trace" if it's linking to a service, and not a trace? Or is there a way that we can fetch a relevant Also, I'm testing on |
Okay, I've split the APM button into a separate issue in that case: #84849 |
Some acceptance criteria from #80307 still needs to be met:
Note: We are NOT going to implement the sparkline visualizations from the design, to avoid performance bucketing problems.
Implement the View Trace in APM buttonNow handled by [Metrics UI] Process List - Add View in APM button #84849Discussion about how to link to APM is partially here -> #80307 (comment)
Group queries byHolding off on this; seems to be buggyprocess.command_line
instead ofsystem.process.cmdline
for proper ECS adherence (Duplicate system.process.cmdline field with process.command_line ECS field name beats#22325)The text was updated successfully, but these errors were encountered: