Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Grouping variables from statement #1053

Open
maryliag opened this issue May 21, 2024 · 5 comments
Open

DB Grouping variables from statement #1053

maryliag opened this issue May 21, 2024 · 5 comments
Assignees
Labels
area:db enhancement New feature or request

Comments

@maryliag
Copy link
Contributor

maryliag commented May 21, 2024

Is your change request related to a problem? Please describe.

As part of sanitization, one improvement is to also do a grouping of the replacements. Splitting this issue from #717 to focus on the grouping itself.

Describe the solution you'd like

For example:
When there was IN clause, it would be replaced by one of the values:

  • __more1_10__
  • __more10_100__
  • __more100_200__
  • __more200_300__
  • __more300_400__
  • __more400_500__
  • __more500_600__
  • __more600_700__
  • __more700_800__
  • __more800_900__
  • __more900_1000__
  • __more1000_plus__.

That created a nice balance of separating groups that would use different plan executions, but at the same time keeping cardinality lower of different possible final strings, since the list can be quite big (I saw cases with 20k+ values in a list)

Describe alternatives you've considered

Another solution would be to always replace with the exact value is being grouped, such as __more23__, but that would increase cardinality and this level of details is not that helpful. A solution creating buckets would make more sense.

Additional context

No response

@maryliag
Copy link
Contributor Author

@trask I created the issue as we discussed on the last SIG, but I don't have permission to add this to the DB Client Semantic Convention project

@joaopgrassi
Copy link
Member

joaopgrassi commented May 22, 2024

I added to the project now and removed the triage label :) @maryliag will you be working on this? Should I assign it to you?

@maryliag
Copy link
Contributor Author

thank you @joaopgrassi ! And yes, you can assign it to me

@trask
Copy link
Member

trask commented Jun 21, 2024

let's add something after #1100 to mention in lists MAY be collapsed in some way

@trask
Copy link
Member

trask commented Jul 12, 2024

Discussed previously in DB semconv meeting:

I sent #1243 to address #1053 (comment).

After that is merged we can postpone the remaining portions of this issue until after stability.

@trask trask moved this to Post Stability in Database Client Semantic Conventions Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db enhancement New feature or request
Projects
Status: Post Stability
Development

No branches or pull requests

3 participants