Improve the JSON parser performance. #7723

cyriltovena · 2022-11-18T12:12:25Z

Sorry I couldn't help myself and worked a bit on the JSON parser by using another library that allows way less allocations.

On top of this, the parser code is also much simpler to understand.

I realize that we could possibly also intern/cache (limited obviously) extracted JSON value per key. I would do it per key because otherwise, a high cardinality key can take over the intern/cache which will make it worthless. Happy to walk someone through the idea, I think this would be a big jump too.

Would be great to run and test this in our dev environment using some JSON queries before releasing it.

❯ benchstat before.txt after.txt                                                                                                              
name                             old time/op    new time/op    delta
_Parser/json/no_labels_hints-10    2.53µs ± 0%    2.16µs ± 2%  -14.59%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10       1.93µs ± 0%    1.47µs ± 0%  -23.96%  (p=0.008 n=5+5)

name                             old alloc/op   new alloc/op   delta
_Parser/json/no_labels_hints-10      656B ± 0%      280B ± 0%  -57.32%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10         512B ± 0%      176B ± 0%  -65.62%  (p=0.008 n=5+5)

name                             old allocs/op  new allocs/op  delta
_Parser/json/no_labels_hints-10      46.0 ± 0%      18.0 ± 0%  -60.87%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10         39.0 ± 0%      12.0 ± 0%  -69.23%  (p=0.008 n=5+5)

grafanabot · 2022-11-18T12:41:15Z

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

-           ingester	-0.1%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

jeschkies · 2022-11-18T12:55:35Z

pkg/logql/log/parser.go

+	// snapshot the current prefix position
+	prefixLen := len(j.prefixBuffer)
+	j.prefixBuffer = append(j.prefixBuffer, byte(jsonSpacer))
+	j.prefixBuffer = appendSanitized(j.prefixBuffer, key)


We could sanitize and unescape at the same time.

We only sanitize key but not values, this is to comply to the Prometheus spec

jeschkies · 2022-11-18T13:07:44Z

pkg/logql/log/parser.go

+	var stackbuf [unescapeStackBufSize]byte // stack-allocated array for allocation-free unescaping of small strings
+	bU, err := jsonparser.Unescape(b, stackbuf[:])


This is really nice. Do you know how much overhead Unescape is?

I'm wondering if we should go the other way around and use the escape raw string. This makes it a little bit more complicated later on when the labels are matched or read out. Overall I'm wondering if labels should be lazy. I've talked to Ed about this. If we can somehow can ensure the original line is not garbage collected until the process is done it would be enough to keep an index or rather unsafe pointer into the line. No allocation is required.

Most of the cpu is there, not sure if we can change this without risk. However I do like the lazy idea give it a go. What would be great is to limit extraction but I wasn't able to because the sanitize function is irreversible.

MasslessParticle · 2023-01-11T23:37:49Z

@cyriltovena @jeschkies Is this ready to be merged?

jeschkies · 2023-01-12T08:19:36Z

@MasslessParticle not really. I wanted to try something else first.

MasslessParticle · 2023-01-12T15:22:17Z

@jeschkies Should we convert this to a draft PR, then?

owen-d

This is clean, through and through. A pleasure to read, sorry it took me so long.

@jeschkies happy to see what you come up with, but I don't think it should block this PR

vlad-diachenko

looks awesome !!

vlad-diachenko · 2023-01-13T08:22:08Z

@cyriltovena should we mention such changes in the changelog? I believe that it might worth to mention because it improves json parser performance a lot ... ))

Regression introduced by grafana#7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>

Regression introduced by #7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>

cyriltovena · 2023-01-13T15:44:37Z

@cyriltovena should we mention such changes in the changelog? I believe that it might worth to mention because it improves json parser performance a lot ... ))

Sure however it would be better to try it in a cluster see the real difference. May be you can compare before and after ? I'd love to know.

cyriltovena added 3 commits November 18, 2022 11:23

Replace the json parser with a less heavy allocations one

6904665

Avoid to clear the full prefix state in the json parser

6aee6a1

Reverts change on the labels hints

2ec3830

cyriltovena requested a review from a team as a code owner November 18, 2022 12:12

pull-request-size bot added the size/L label Nov 18, 2022

Clearner code

fa46e15

jeschkies reviewed Nov 18, 2022

View reviewed changes

owen-d approved these changes Jan 12, 2023

View reviewed changes

MasslessParticle merged commit 2861c00 into grafana:main Jan 12, 2023

vlad-diachenko reviewed Jan 13, 2023

View reviewed changes

dannykopping pushed a commit to dannykopping/loki that referenced this pull request Jan 13, 2023

Fixing test regression

8dd0f42

Regression introduced by grafana#7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>

dannykopping mentioned this pull request Jan 13, 2023

Fixing test regression #8130

Merged

dannykopping pushed a commit that referenced this pull request Jan 13, 2023

Fixing test regression (#8130)

b9a9cdf

Regression introduced by #7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>

MasslessParticle mentioned this pull request Mar 8, 2023

Reimplement JsonExpressionParser in terms of jsonparser #8734

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the JSON parser performance. #7723

Improve the JSON parser performance. #7723

cyriltovena commented Nov 18, 2022

grafanabot commented Nov 18, 2022

jeschkies Nov 18, 2022

cyriltovena Nov 18, 2022

jeschkies Nov 18, 2022

cyriltovena Nov 18, 2022

MasslessParticle commented Jan 11, 2023

jeschkies commented Jan 12, 2023

MasslessParticle commented Jan 12, 2023

owen-d left a comment •

edited

Loading

vlad-diachenko left a comment

vlad-diachenko commented Jan 13, 2023

cyriltovena commented Jan 13, 2023

		var stackbuf [unescapeStackBufSize]byte // stack-allocated array for allocation-free unescaping of small strings
		bU, err := jsonparser.Unescape(b, stackbuf[:])

Improve the JSON parser performance. #7723

Improve the JSON parser performance. #7723

Conversation

cyriltovena commented Nov 18, 2022

grafanabot commented Nov 18, 2022

jeschkies Nov 18, 2022

Choose a reason for hiding this comment

cyriltovena Nov 18, 2022

Choose a reason for hiding this comment

jeschkies Nov 18, 2022

Choose a reason for hiding this comment

cyriltovena Nov 18, 2022

Choose a reason for hiding this comment

MasslessParticle commented Jan 11, 2023

jeschkies commented Jan 12, 2023

MasslessParticle commented Jan 12, 2023

owen-d left a comment • edited Loading

Choose a reason for hiding this comment

vlad-diachenko left a comment

Choose a reason for hiding this comment

vlad-diachenko commented Jan 13, 2023

cyriltovena commented Jan 13, 2023

owen-d left a comment •

edited

Loading