Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve normalizer cpu usage #43

Merged
merged 6 commits into from
Dec 17, 2024
Merged

improve normalizer cpu usage #43

merged 6 commits into from
Dec 17, 2024

Conversation

lu-zhengda
Copy link
Collaborator

@lu-zhengda lu-zhengda commented Dec 16, 2024

This PR improves the high CPU utilization in the Normalizer by addressing the inefficiency caused by frequent calls to strings.ToUpper. Profiling with Go's cpuprofile revealed that strings.ToUpper was a major contributor to CPU usage.

Key Changes

1. Parsing Behavior for SQL Commands:

  • Updated behavior to only match SQL commands when identifiers are:
    • All uppercase (e.g., SELECT)
    • All lowercase (e.g., select)
    • Titlecase (e.g., Select)
  • Unformatted identifiers with inconsistent casing (e.g., sElEcT) are no longer parsed as SQL commands in the statement metadata.
  • This change is a deliberate tradeoff between performance and completeness to avoid repeatedly calling strings.ToUpper on every identifier, which was the main cause of high CPU usage.

2. Normalized Query Output:

  • The end output of the normalized query statement remains unaffected.
  • Users can still opt to uppercase all SQL keywords if desired.
  • By default, the normalizer does not alter the case of the input query, preserving existing behavior.

Benchmark Results

Before vs. After

Sub-Benchmark Iterations (↑) Time per Op (ns/op) (↓) Memory (B/op) (↓) Allocations (↓)
Escaping/512 +34.7% -27.7% -36.7% -57.9%
Grouping/199 +37.4% -28.9% -27.4% -42.1%
Large/3694 +73.3% -45.7% -45.4% -69.2%
Complex/969 +79.5% -43.2% -48.3% -65.6%
SuperLarge/4198 +101.6% -43.8% -42.0% -70.0%

Summary of Improvements

Performance

  • Execution time dropped by ~27.7% to ~45.7% across benchmarks, with the most dramatic improvements for larger workloads.

Memory:

  • Reduced memory consumption by ~36.7% to ~48.3%, lowering pressure on the garbage collector.

https://datadoghq.atlassian.net/browse/DBMON-4759

sqllexer_utils.go Outdated Show resolved Hide resolved
sqllexer_utils.go Outdated Show resolved Hide resolved
sqllexer_utils.go Outdated Show resolved Hide resolved
sqllexer_utils.go Outdated Show resolved Hide resolved
sqllexer_utils.go Outdated Show resolved Hide resolved
@lu-zhengda
Copy link
Collaborator Author

lu-zhengda commented Dec 16, 2024

Before

 ~/go/src/github.com/DataDog/go-sqllexer/ [main] go test -benchmem -run=^$ -bench ^BenchmarkObfuscationAndNormalization$ github.com/DataDog/go-sqllexer 
goos: darwin
goarch: arm64
pkg: github.com/DataDog/go-sqllexer
BenchmarkObfuscationAndNormalization/Escaping/512-10              144000              7182 ns/op            1440 B/op         76 allocs/op
BenchmarkObfuscationAndNormalization/Grouping/199-10              288408              4121 ns/op             760 B/op         38 allocs/op
BenchmarkObfuscationAndNormalization/Large/3694-10                 12823             94193 ns/op           26360 B/op       1103 allocs/op
BenchmarkObfuscationAndNormalization/Complex/969-10                44007             27684 ns/op            6520 B/op        302 allocs/op
BenchmarkObfuscationAndNormalization/SuperLarge/4198-10            10000            113920 ns/op           41448 B/op       1059 allocs/op
PASS
ok      github.com/DataDog/go-sqllexer  7.576s

After

 ~/go/src/github.com/DataDog/go-sqllexer/ [zhengda.lu/normalize*] go test -benchmem -run=^$ -bench ^BenchmarkObfuscationAndNormalization$ github.com/DataDog/go-sqllexer -cpuprofile=cpu.prof
goos: darwin
goarch: arm64
pkg: github.com/DataDog/go-sqllexer
BenchmarkObfuscationAndNormalization/Escaping/512-10              194044              5191 ns/op             912 B/op         32 allocs/op
BenchmarkObfuscationAndNormalization/Grouping/199-10              396208              2932 ns/op             552 B/op         22 allocs/op
BenchmarkObfuscationAndNormalization/Large/3694-10                 22224             51106 ns/op           14392 B/op        340 allocs/op
BenchmarkObfuscationAndNormalization/Complex/969-10                78969             15727 ns/op            3368 B/op        104 allocs/op
BenchmarkObfuscationAndNormalization/SuperLarge/4198-10            20156             64022 ns/op           24032 B/op        318 allocs/op
PASS
ok      github.com/DataDog/go-sqllexer  7.852s

@lu-zhengda
Copy link
Collaborator Author

strings.ToUpper consumes high CPU because it processes each character individually, performing Unicode-compliant transformations.

  • Unicode Table Lookups: Determines if a character needs transformation, which is computationally intensive, especially for non-ASCII characters.
  • UTF-8 Decoding: Converts multi-byte characters to runes and back, adding overhead.
  • Memory Allocation: Creates a new string for every transformation, increasing GC activity.

@lu-zhengda lu-zhengda marked this pull request as ready for review December 17, 2024 02:42
@lu-zhengda lu-zhengda requested a review from a team as a code owner December 17, 2024 02:42
@lu-zhengda lu-zhengda merged commit a418b44 into main Dec 17, 2024
4 checks passed
@lu-zhengda lu-zhengda deleted the zhengda.lu/normalize branch December 17, 2024 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants