Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rum): categorize transactions based on current url #827

Merged
merged 4 commits into from
Jul 3, 2020

Conversation

vigneshshanmugam
Copy link
Member

@vigneshshanmugam vigneshshanmugam commented Jun 26, 2020

  • fix Provide a default name for page load transactions #255
  • fix use pathname as transaction name for route change  #297
  • solution for Handling huge cardinality of page load transaction names #56
  • PR solves the huge cardinality problems by using a heuristic based approach to shorten the page URL to a shorten version which would help name the transactions in a user understandable way than having it as UNKNOWN.
  • The slugify algorithm uses a modified version of prior work done by @jahtalab.
  • How it works.
    1. Depth of the path tree is always set to 2 which means /a/b/c/d would become /a/b/*. The reasoning for depth to be 2 and not 1 is because most websites uses locale as the first depth in the URL tree so we want to categorize them better upfront than just printing locales in transaction.
    2. Trailing slashes does not create a new depth.
    3. Path using Digits and path followed by digits are redacted. ex - /1234/product -> /:id/*
    4. Special characters more than max length of 2 are redacted.
    5. Mix of upper case and lower case characters are redacted

Please see the test cases for more in-depth overview.

The algorithm seems to be work for most of the websites i have tested by using a crawler. The setup is here if you want to play around.

@vigneshshanmugam vigneshshanmugam requested a review from hmdhk June 26, 2020 14:20
@apmmachine
Copy link
Contributor

apmmachine commented Jun 26, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #827 updated]

  • Start Time: 2020-07-03T08:50:52.748+0000

  • Duration: 76 min 16 sec

Test stats 🧪

Test Results
Failed 0
Passed 968
Skipped 10
Total 978

Steps errors

Expand to view the steps failures

  • Name: Bundlesize
    • Description: #!/bin/bash set -o pipefail npm run bundlesize|tee bundlesize.txt

    • Duration: 1 min 30 sec

    • Start Time: 2020-07-03T08:56:34.030+0000

    • log

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a decent approach to me, particularly because it's easy to override in cases where this approach doesn't work.

What do you think about adding the function to the agent API, so folks can call it directly with a greater depth as needed?

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved
@vigneshshanmugam
Copy link
Member Author

@axw Good suggestion, I am just worried about one thing though. If it creates too much cardinality in transaction name and our metrics based approach in APM UI might suffer due to the consequence of this change.

Copy link
Contributor

@hmdhk hmdhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigneshshanmugam , I'm a bit concerned about the performance impacts of this, we should run some benchmarks to make sure. we should probably use 3 url parts instead of 2, it's only my gut feeling that 2 is too small 😃

Also I think we can move this into utils, I don't think a separate file is needed.

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved
packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved
packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved
@hmdhk
Copy link
Contributor

hmdhk commented Jun 29, 2020

@axw , We've discussed the approach here and it seems there are some benefits on putting this logic in APM server. Since the logic here is used in aggregation and that should be tied to the stack version. WDYT?

This logic is also useful for replacing the unknown transactions, maybe that can also be done in the apm server or we can keep it in the agent.

@axw
Copy link
Member

axw commented Jun 30, 2020

@vigneshshanmugam

Good suggestion, I am just worried about one thing though. If it creates too much cardinality in transaction name and our metrics based approach in APM UI might suffer due to the consequence of this change.

Fair enough. I'm a bit worried that people who can't use this approach, with a fixed depth of 2 (or 3), will revert to using the whole path. I guess we can deal with that if it later though, if it does become a problem.

@jahtalab

We've discussed the approach here and it seems there are some benefits on putting this logic in APM server. Since the logic here is used in aggregation and that should be tied to the stack version. WDYT?

What are those benefits? Performance? I'd be interested to know what the overhead of this is; it's only for page loads, so doesn't seem like it would be significant. On the other hand, pushing it to the server means concentrating the performance cost, which might end up very expensive when your application has many users.

To me it feels simpler overall to set it in the agent, as it's just another agent as far as the server is concerned. Setting it in the agent provides greater flexibility, whereas the server would have only limited configuration; we wouldn't be adding arbitrary script execution. I'd also be a bit wary of creating an expectation of transaction name overriding for non-RUM, which won't work because of client-side breakdown aggregations.

@hmdhk
Copy link
Contributor

hmdhk commented Jun 30, 2020

@axw , the main benefit of having this on the Apm server is alignment with the stack version release, in other words since this is used by the apm server for aggregation, any changes to this logic will affect the results that are stored in ES and therefore should be released with proper versioning (e.g. if there's a breaking change). If this doesn't seem like an issue to you, I'm ok with moving forward with the current implementation.

The performance on the agent side is not a big concern (although we will verify this with numbers)

@axw
Copy link
Member

axw commented Jun 30, 2020

Transaction names have always been used for grouping and aggregation (in the UI), so I don't see a need to tie this to a stack version. I'm happy to continue with the current approach.

@vigneshshanmugam
Copy link
Member Author

Fair enough. I'm a bit worried that people who can't use this approach, with a fixed depth of 2 (or 3), will revert to using the whole path. I guess we can deal with that if it later though, if it does become a problem.

Yeah we can detect the whole path usage and can apply the same redact logic and provide a warning in the agent itself.

The performance on the agent side is not a big concern (although we will verify this with numbers)

I will post the numbers. But last time i checked, it wasn't hitting us hard.

@vigneshshanmugam
Copy link
Member Author

Benchmark numbers

Unique transaction count -  ~650000
Unique transaction Names generated - 72

Here are the results

{ 
  mean: 0.004329631194898993,
  median: 0.001528,
  p75: 2.633728,
  p90: 2.633728,
  p99: 2.633728 
}

The impact is super minimal and it will not affect the agent code that much as we still would ask the user to set the pageLoadTransactionName if they need more granular control.

@apmmachine
Copy link
Contributor

apmmachine commented Jun 30, 2020

📦 Bundlesize report

Filename Size(bundled) Size(gzip) Diff(gzip)
elastic-apm-opentracing.umd.min.js 60.1 KiB 19.3 KiB ⚠️ 271 Bytes
elastic-apm-rum.umd.min.js 54.2 KiB 17.9 KiB ⚠️ 267 Bytes

@vigneshshanmugam vigneshshanmugam requested a review from hmdhk July 1, 2020 08:08
Copy link
Contributor

@hmdhk hmdhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vigneshshanmugam what do you think about moving slugify into utils?

packages/rum-core/src/common/slugify.js Outdated Show resolved Hide resolved
packages/rum-core/test/common/slugify.spec.js Outdated Show resolved Hide resolved
@vigneshshanmugam
Copy link
Member Author

@jahtalab I thought about it, Utils file in itself is so huge and putting this functionality there would increase it more and makes it bit harder to glimpse through. I am in favor of keeping it separate file as its more cleaner.

@codecov-commenter
Copy link

Codecov Report

Merging #827 into master will increase coverage by 0.04%.
The diff coverage is 91.66%.

@@            Coverage Diff             @@
##           master     #827      +/-   ##
==========================================
+ Coverage   92.94%   92.99%   +0.04%     
==========================================
  Files          50       51       +1     
  Lines        2283     2311      +28     
  Branches      458      466       +8     
==========================================
+ Hits         2122     2149      +27     
- Misses        158      159       +1     
  Partials        3        3              
Impacted Files Coverage Δ
packages/rum-core/src/bootstrap.js 35.29% <0.00%> (-2.21%) ⬇️
packages/rum-core/src/common/patching/index.js 100.00% <ø> (ø)
...c/performance-monitoring/performance-monitoring.js 94.90% <ø> (ø)
...e/src/performance-monitoring/capture-navigation.js 100.00% <100.00%> (ø)
packages/rum-core/src/state.js 100.00% <100.00%> (ø)
packages/rum-core/src/common/slugify.js 100.00% <0.00%> (ø)
.../src/performance-monitoring/transaction-service.js 90.30% <0.00%> (+0.11%) ⬆️

Copy link
Contributor

@hmdhk hmdhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to move slugify to url.js

@vigneshshanmugam vigneshshanmugam requested a review from hmdhk July 3, 2020 09:10
@vigneshshanmugam
Copy link
Member Author

I will do the docs change on a separate PR.

@hmdhk hmdhk merged commit 3888653 into elastic:master Jul 3, 2020
@vigneshshanmugam vigneshshanmugam deleted the slug-page-load branch July 3, 2020 11:15
v1v added a commit to v1v/apm-agent-rum-js that referenced this pull request Jul 3, 2020
* upstream/master:
  feat(rum): categorize transactions based on current url (elastic#827)
David-Development pushed a commit to David-Development/apm-agent-rum-js that referenced this pull request Oct 20, 2021
* feat(rum): categorize transactions based on current url

* chore: address review

* chore: fix redefined transactions

* chore: move slug to url.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use pathname as transaction name for route change Provide a default name for page load transactions
5 participants