feat(rum): categorize transactions based on current url #827

vigneshshanmugam · 2020-06-26T14:19:05Z

fix Provide a default name for page load transactions #255
fix use pathname as transaction name for route change #297
solution for Handling huge cardinality of page load transaction names #56
PR solves the huge cardinality problems by using a heuristic based approach to shorten the page URL to a shorten version which would help name the transactions in a user understandable way than having it as UNKNOWN.
The slugify algorithm uses a modified version of prior work done by @jahtalab.
How it works.
1. Depth of the path tree is always set to 2 which means /a/b/c/d would become /a/b/*. The reasoning for depth to be 2 and not 1 is because most websites uses locale as the first depth in the URL tree so we want to categorize them better upfront than just printing locales in transaction.
2. Trailing slashes does not create a new depth.
3. Path using Digits and path followed by digits are redacted. ex - /1234/product -> /:id/*
4. Special characters more than max length of 2 are redacted.
5. Mix of upper case and lower case characters are redacted

Please see the test cases for more in-depth overview.

The algorithm seems to be work for most of the websites i have tested by using a crawler. The setup is here if you want to play around.

packages/rum-core/src/common/slugify.js

apmmachine · 2020-06-26T14:49:31Z

💚 Build Succeeded

Expand to view the summary

Build stats

Build Cause: [Pull request #827 updated]
Start Time: 2020-07-03T08:50:52.748+0000
Duration: 76 min 16 sec

Test stats 🧪

Test	Results
Failed	0
Passed	968
Skipped	10
Total	978

Steps errors

Expand to view the steps failures

Name: Bundlesize
- Description: #!/bin/bash set -o pipefail npm run bundlesize|tee bundlesize.txt
- Duration: 1 min 30 sec
- Start Time: 2020-07-03T08:56:34.030+0000
- log

axw

Seems like a decent approach to me, particularly because it's easy to override in cases where this approach doesn't work.

What do you think about adding the function to the agent API, so folks can call it directly with a greater depth as needed?

packages/rum-core/src/common/slugify.js

vigneshshanmugam · 2020-06-29T08:21:52Z

@axw Good suggestion, I am just worried about one thing though. If it creates too much cardinality in transaction name and our metrics based approach in APM UI might suffer due to the consequence of this change.

hmdhk

@vigneshshanmugam , I'm a bit concerned about the performance impacts of this, we should run some benchmarks to make sure. we should probably use 3 url parts instead of 2, it's only my gut feeling that 2 is too small 😃

Also I think we can move this into utils, I don't think a separate file is needed.

packages/rum-core/src/common/slugify.js

hmdhk · 2020-06-29T15:05:24Z

@axw , We've discussed the approach here and it seems there are some benefits on putting this logic in APM server. Since the logic here is used in aggregation and that should be tied to the stack version. WDYT?

This logic is also useful for replacing the unknown transactions, maybe that can also be done in the apm server or we can keep it in the agent.

axw · 2020-06-30T02:04:03Z

@vigneshshanmugam

Good suggestion, I am just worried about one thing though. If it creates too much cardinality in transaction name and our metrics based approach in APM UI might suffer due to the consequence of this change.

Fair enough. I'm a bit worried that people who can't use this approach, with a fixed depth of 2 (or 3), will revert to using the whole path. I guess we can deal with that if it later though, if it does become a problem.

@jahtalab

We've discussed the approach here and it seems there are some benefits on putting this logic in APM server. Since the logic here is used in aggregation and that should be tied to the stack version. WDYT?

What are those benefits? Performance? I'd be interested to know what the overhead of this is; it's only for page loads, so doesn't seem like it would be significant. On the other hand, pushing it to the server means concentrating the performance cost, which might end up very expensive when your application has many users.

To me it feels simpler overall to set it in the agent, as it's just another agent as far as the server is concerned. Setting it in the agent provides greater flexibility, whereas the server would have only limited configuration; we wouldn't be adding arbitrary script execution. I'd also be a bit wary of creating an expectation of transaction name overriding for non-RUM, which won't work because of client-side breakdown aggregations.

hmdhk · 2020-06-30T07:52:45Z

@axw , the main benefit of having this on the Apm server is alignment with the stack version release, in other words since this is used by the apm server for aggregation, any changes to this logic will affect the results that are stored in ES and therefore should be released with proper versioning (e.g. if there's a breaking change). If this doesn't seem like an issue to you, I'm ok with moving forward with the current implementation.

The performance on the agent side is not a big concern (although we will verify this with numbers)

axw · 2020-06-30T08:02:19Z

Transaction names have always been used for grouping and aggregation (in the UI), so I don't see a need to tie this to a stack version. I'm happy to continue with the current approach.

vigneshshanmugam · 2020-06-30T08:19:28Z

Fair enough. I'm a bit worried that people who can't use this approach, with a fixed depth of 2 (or 3), will revert to using the whole path. I guess we can deal with that if it later though, if it does become a problem.

Yeah we can detect the whole path usage and can apply the same redact logic and provide a warning in the agent itself.

The performance on the agent side is not a big concern (although we will verify this with numbers)

I will post the numbers. But last time i checked, it wasn't hitting us hard.

vigneshshanmugam · 2020-06-30T10:18:30Z

Benchmark numbers

Unique transaction count -  ~650000
Unique transaction Names generated - 72

Here are the results

{ 
  mean: 0.004329631194898993,
  median: 0.001528,
  p75: 2.633728,
  p90: 2.633728,
  p99: 2.633728 
}

The impact is super minimal and it will not affect the agent code that much as we still would ask the user to set the pageLoadTransactionName if they need more granular control.

apmmachine · 2020-06-30T11:34:49Z

📦 Bundlesize report

Filename	Size(bundled)	Size(gzip)	Diff(gzip)
elastic-apm-opentracing.umd.min.js	60.1 KiB	19.3 KiB	⚠️ 271 Bytes
elastic-apm-rum.umd.min.js	54.2 KiB	17.9 KiB	⚠️ 267 Bytes

hmdhk

@vigneshshanmugam what do you think about moving slugify into utils?

packages/rum-core/src/performance-monitoring/transaction-service.js

packages/rum-core/src/common/slugify.js

packages/rum-core/test/common/slugify.spec.js

vigneshshanmugam · 2020-07-01T13:34:21Z

@jahtalab I thought about it, Utils file in itself is so huge and putting this functionality there would increase it more and makes it bit harder to glimpse through. I am in favor of keeping it separate file as its more cleaner.

codecov-commenter · 2020-07-01T13:57:48Z

Codecov Report

Merging #827 into master will increase coverage by 0.04%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master     #827      +/-   ##
==========================================
+ Coverage   92.94%   92.99%   +0.04%     
==========================================
  Files          50       51       +1     
  Lines        2283     2311      +28     
  Branches      458      466       +8     
==========================================
+ Hits         2122     2149      +27     
- Misses        158      159       +1     
  Partials        3        3

Impacted Files	Coverage Δ
packages/rum-core/src/bootstrap.js	`35.29% <0.00%> (-2.21%)`	⬇️
packages/rum-core/src/common/patching/index.js	`100.00% <ø> (ø)`
...c/performance-monitoring/performance-monitoring.js	`94.90% <ø> (ø)`
...e/src/performance-monitoring/capture-navigation.js	`100.00% <100.00%> (ø)`
packages/rum-core/src/state.js	`100.00% <100.00%> (ø)`
packages/rum-core/src/common/slugify.js	`100.00% <0.00%> (ø)`
.../src/performance-monitoring/transaction-service.js	`90.30% <0.00%> (+0.11%)`	⬆️

hmdhk

We decided to move slugify to url.js

packages/rum-core/test/common/slugify.spec.js

packages/rum-core/src/performance-monitoring/transaction-service.js

vigneshshanmugam · 2020-07-03T09:12:34Z

I will do the docs change on a separate PR.

* upstream/master: feat(rum): categorize transactions based on current url (elastic#827)

* feat(rum): categorize transactions based on current url * chore: address review * chore: fix redefined transactions * chore: move slug to url.js

vigneshshanmugam requested a review from hmdhk June 26, 2020 14:20

vigneshshanmugam commented Jun 26, 2020

View reviewed changes

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

vigneshshanmugam commented Jun 26, 2020

View reviewed changes

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

axw reviewed Jun 29, 2020

View reviewed changes

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

hmdhk reviewed Jun 29, 2020

View reviewed changes

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

vigneshshanmugam added 2 commits June 30, 2020 12:26

feat(rum): categorize transactions based on current url

0134178

chore: address review

3bce98b

vigneshshanmugam force-pushed the slug-page-load branch from 625d80d to 3bce98b Compare June 30, 2020 10:38

chore: fix redefined transactions

b9b3b04

vigneshshanmugam requested a review from hmdhk July 1, 2020 08:08

hmdhk reviewed Jul 1, 2020

View reviewed changes

packages/rum-core/src/performance-monitoring/transaction-service.js Show resolved Hide resolved

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved

packages/rum-core/test/common/slugify.spec.js Outdated Show resolved Hide resolved

hmdhk reviewed Jul 2, 2020

View reviewed changes

packages/rum-core/test/common/slugify.spec.js Outdated Show resolved Hide resolved

packages/rum-core/src/performance-monitoring/transaction-service.js Show resolved Hide resolved

chore: move slug to url.js

95df5a7

vigneshshanmugam requested a review from hmdhk July 3, 2020 09:10

hmdhk approved these changes Jul 3, 2020

View reviewed changes

hmdhk merged commit 3888653 into elastic:master Jul 3, 2020

vigneshshanmugam deleted the slug-page-load branch July 3, 2020 11:15

v1v added a commit to v1v/apm-agent-rum-js that referenced this pull request Jul 3, 2020

Merge remote-tracking branch 'upstream/master' into feature/ci-cdn

3533163

* upstream/master: feat(rum): categorize transactions based on current url (elastic#827)

vigneshshanmugam mentioned this pull request Jul 3, 2020

docs: add default categorisation for page load transaction #834

Merged

axw mentioned this pull request Aug 18, 2020

docs/agents: add sampling spec elastic/apm#307

Merged

felixbarny mentioned this pull request Aug 18, 2020

Transaction name should be "<METHOD> unknown route" when no automatic/configured name elastic/apm-agent-php#135

Open

axw mentioned this pull request Mar 31, 2022

For unknown routes, use unknown route as transaction name instead of full URL path elastic/apm-agent-go#1236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rum): categorize transactions based on current url #827

feat(rum): categorize transactions based on current url #827

vigneshshanmugam commented Jun 26, 2020 •

edited

Loading

apmmachine commented Jun 26, 2020 •

edited

Loading

Build stats

Test stats 🧪

axw left a comment

vigneshshanmugam commented Jun 29, 2020

hmdhk left a comment

hmdhk commented Jun 29, 2020 •

edited

Loading

axw commented Jun 30, 2020

hmdhk commented Jun 30, 2020

axw commented Jun 30, 2020

vigneshshanmugam commented Jun 30, 2020

vigneshshanmugam commented Jun 30, 2020

apmmachine commented Jun 30, 2020 •

edited

Loading

hmdhk left a comment

vigneshshanmugam commented Jul 1, 2020

codecov-commenter commented Jul 1, 2020

hmdhk left a comment

vigneshshanmugam commented Jul 3, 2020

feat(rum): categorize transactions based on current url #827

feat(rum): categorize transactions based on current url #827

Conversation

vigneshshanmugam commented Jun 26, 2020 • edited Loading

apmmachine commented Jun 26, 2020 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Steps errors

axw left a comment

Choose a reason for hiding this comment

vigneshshanmugam commented Jun 29, 2020

hmdhk left a comment

Choose a reason for hiding this comment

hmdhk commented Jun 29, 2020 • edited Loading

axw commented Jun 30, 2020

hmdhk commented Jun 30, 2020

axw commented Jun 30, 2020

vigneshshanmugam commented Jun 30, 2020

vigneshshanmugam commented Jun 30, 2020

Benchmark numbers

apmmachine commented Jun 30, 2020 • edited Loading

📦 Bundlesize report

hmdhk left a comment

Choose a reason for hiding this comment

vigneshshanmugam commented Jul 1, 2020

codecov-commenter commented Jul 1, 2020

Codecov Report

hmdhk left a comment

Choose a reason for hiding this comment

vigneshshanmugam commented Jul 3, 2020

vigneshshanmugam commented Jun 26, 2020 •

edited

Loading

apmmachine commented Jun 26, 2020 •

edited

Loading

hmdhk commented Jun 29, 2020 •

edited

Loading

apmmachine commented Jun 30, 2020 •

edited

Loading