Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: skipping label if it contains special symbol #14068

Merged
merged 3 commits into from
Sep 10, 2024

Conversation

ravishankar15
Copy link
Contributor

@ravishankar15 ravishankar15 commented Sep 6, 2024

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #13963

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@vlad-diachenko
Copy link
Contributor

Hey @ravishankar15 .
Huge thanks for the contribution, looks awesome.
That said, I've done some research to see how the changes affect the performance because this part is crucial for Loki as long as it happens on every query (when parsers are used).
Also, I expect that we won't face utf8.RuneError in the label value in most cases, so we have to make this scenario the most optimized.

Here the code I used in the benchmark

package stringreplacement

import (
	"bytes"
	"strings"
	"testing"
	"unicode/utf8"
)

func replaceNoop(source string) string {
	return source
}

var removeInvalidUtf = func(r rune) rune {
	if r == utf8.RuneError {
		return -1
	}
	return r
}

func replaceContainsAndStringsMap(source string) string {
	if strings.ContainsRune(source, utf8.RuneError) {
		strings.Map(removeInvalidUtf, source)
	}
	return source
}

func replaceStringsMap(source string) string {
	return strings.Map(removeInvalidUtf, source)
}

func replaceContainsAndBytesMap(source []byte) []byte {
	if bytes.ContainsRune(source, utf8.RuneError) {
		bytes.Map(removeInvalidUtf, source)
	}
	return source
}

func replaceBytesMap(source []byte) []byte {
	return bytes.Map(removeInvalidUtf, source)
}

func Benchmark_runeReplacement(b *testing.B) {
	scenarios := map[string]string{
		"with error rune":    "hello�world",
		"without error rune": "hello world",
	}
	for name, source := range scenarios {
		b.Run(name, func(b *testing.B) {
			b.Run("noop", func(b *testing.B) {
				b.ResetTimer()
				b.ReportAllocs()
				for i := 0; i < b.N; i++ {
					replaceNoop(source)
				}
			})

			b.Run("contains and strings.Map()", func(b *testing.B) {
				b.ResetTimer()
				b.ReportAllocs()
				for i := 0; i < b.N; i++ {
					replaceContainsAndStringsMap(source)
				}
			})

			b.Run("strings.Map()", func(b *testing.B) {
				b.ResetTimer()
				b.ReportAllocs()
				for i := 0; i < b.N; i++ {
					replaceStringsMap(source)
				}
			})

			b.Run("contains and bytes.Map()", func(b *testing.B) {
				bytesSource := []byte(source)
				b.ResetTimer()
				b.ReportAllocs()
				for i := 0; i < b.N; i++ {
					replaceContainsAndBytesMap(bytesSource)
				}
			})

			b.Run("bytes.Map()", func(b *testing.B) {
				bytesSource := []byte(source)
				b.ResetTimer()
				b.ReportAllocs()
				for i := 0; i < b.N; i++ {
					replaceBytesMap(bytesSource)
				}
			})
		})
	}
}

here are the results of the benchmark

goos: darwin
goarch: arm64
pkg: go-playground/pkg/stringreplacement
cpu: Apple M3 Pro
Benchmark_runeReplacement
Benchmark_runeReplacement/with_error_rune
Benchmark_runeReplacement/with_error_rune/noop
Benchmark_runeReplacement/with_error_rune/noop-12    	1000000000	         0.2514 ns/op	       0 B/op	       0 allocs/op
Benchmark_runeReplacement/with_error_rune/contains_and_strings.Map()
Benchmark_runeReplacement/with_error_rune/contains_and_strings.Map()-12         	25887314	        46.11 ns/op	      24 B/op	       1 allocs/op
Benchmark_runeReplacement/with_error_rune/strings.Map()
Benchmark_runeReplacement/with_error_rune/strings.Map()-12                      	30545458	        39.25 ns/op	      24 B/op	       1 allocs/op
Benchmark_runeReplacement/with_error_rune/contains_and_bytes.Map()
Benchmark_runeReplacement/with_error_rune/contains_and_bytes.Map()-12           	24849993	        46.43 ns/op	      16 B/op	       1 allocs/op
Benchmark_runeReplacement/with_error_rune/bytes.Map()
Benchmark_runeReplacement/with_error_rune/bytes.Map()-12                        	34127226	        35.98 ns/op	      16 B/op	       1 allocs/op
Benchmark_runeReplacement/without_error_rune
Benchmark_runeReplacement/without_error_rune/noop
Benchmark_runeReplacement/without_error_rune/noop-12                            	1000000000	         0.2521 ns/op	       0 B/op	       0 allocs/op
Benchmark_runeReplacement/without_error_rune/contains_and_strings.Map()
Benchmark_runeReplacement/without_error_rune/contains_and_strings.Map()-12      	145209346	         8.269 ns/op	       0 B/op	       0 allocs/op
Benchmark_runeReplacement/without_error_rune/strings.Map()
Benchmark_runeReplacement/without_error_rune/strings.Map()-12                   	74385352	        16.16 ns/op	       0 B/op	       0 allocs/op
Benchmark_runeReplacement/without_error_rune/contains_and_bytes.Map()
Benchmark_runeReplacement/without_error_rune/contains_and_bytes.Map()-12        	62231515	        19.52 ns/op	       0 B/op	       0 allocs/op
Benchmark_runeReplacement/without_error_rune/bytes.Map()
Benchmark_runeReplacement/without_error_rune/bytes.Map()-12                     	37209013	        31.44 ns/op	      16 B/op	       1 allocs/op
PASS

If you check the results, we can see that if we add the check if the value contains utf8.RuneError before calling bytes.Map() or strings.Map(), we can reduce execution time almost by twice:

Benchmark_runeReplacement/without_error_rune/contains_and_bytes.Map()-12        	62231515	        19.52 ns/op
VS 
Benchmark_runeReplacement/without_error_rune/bytes.Map()-12                     	37209013	        31.44 ns/op

and

Benchmark_runeReplacement/without_error_rune/contains_and_strings.Map()-12      	145209346	         8.269 ns/op
VS
Benchmark_runeReplacement/without_error_rune/strings.Map()-12                   	74385352	        16.16 ns/op

Also, you can see that bytes.Map() causes additional allocation every time we call it, regardless we modify the source string or not, so, adding the check before calling it fixes this issue as well...

Benchmark_runeReplacement/without_error_rune/contains_and_bytes.Map()-12        	0 B/op	       0 allocs/op
VS
Benchmark_runeReplacement/without_error_rune/bytes.Map()-12                     	16 B/op	       1 allocs/op

Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, however, we need to address some small things to get it merged

if r == utf8.RuneError {
return ""
return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's necessary to replace it with (whitespace) instead of skipping it

}
res = strings.Map(removeInvalidUtf, res)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see: #14068 (comment)

Suggested change
res = strings.Map(removeInvalidUtf, res)
if strings.ContainsRune(source, utf8.RuneError) {
res = strings.Map(removeInvalidUtf, res)
}

for _, r := range res {

// rune error is rejected by Prometheus hence replacing them with space
removeInvalidUtf := func(r rune) rune {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please move this function outside of unescapeJSONString function to re-use it in logfmt parser ...

val = nil

// the rune error replacement is rejected by Prometheus hence skiping them.
removeInvalidUtf := func(r rune) rune {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be re-used from json parse

}
val = bytes.Map(removeInvalidUtf, val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see: #14068 (comment)

Suggested change
val = bytes.Map(removeInvalidUtf, val)
if bytes.ContainsRune(source, utf8.RuneError) {
val = bytes.Map(removeInvalidUtf, val)
}

@ravishankar15
Copy link
Contributor Author

@vlad-diachenko Thanks for the insights :) I have made the suggestion

Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome 💎 🥇 huge thanks for the contribution @ravishankar15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[loki] Json|Logfmt Parser returns empty label value if it contains symbol
2 participants