Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the JSON parser performance. #7723

Merged
merged 4 commits into from
Jan 12, 2023

Conversation

cyriltovena
Copy link
Contributor

Sorry I couldn't help myself and worked a bit on the JSON parser by using another library that allows way less allocations.

On top of this, the parser code is also much simpler to understand.

I realize that we could possibly also intern/cache (limited obviously) extracted JSON value per key. I would do it per key because otherwise, a high cardinality key can take over the intern/cache which will make it worthless. Happy to walk someone through the idea, I think this would be a big jump too.

Would be great to run and test this in our dev environment using some JSON queries before releasing it.

❯ benchstat before.txt after.txt                                                                                                              
name                             old time/op    new time/op    delta
_Parser/json/no_labels_hints-10    2.53µs ± 0%    2.16µs ± 2%  -14.59%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10       1.93µs ± 0%    1.47µs ± 0%  -23.96%  (p=0.008 n=5+5)

name                             old alloc/op   new alloc/op   delta
_Parser/json/no_labels_hints-10      656B ± 0%      280B ± 0%  -57.32%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10         512B ± 0%      176B ± 0%  -65.62%  (p=0.008 n=5+5)

name                             old allocs/op  new allocs/op  delta
_Parser/json/no_labels_hints-10      46.0 ± 0%      18.0 ± 0%  -60.87%  (p=0.008 n=5+5)
_Parser/json/labels_hints-10         39.0 ± 0%      12.0 ± 0%  -69.23%  (p=0.008 n=5+5)

@cyriltovena cyriltovena requested a review from a team as a code owner November 18, 2022 12:12
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

-           ingester	-0.1%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

// snapshot the current prefix position
prefixLen := len(j.prefixBuffer)
j.prefixBuffer = append(j.prefixBuffer, byte(jsonSpacer))
j.prefixBuffer = appendSanitized(j.prefixBuffer, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could sanitize and unescape at the same time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only sanitize key but not values, this is to comply to the Prometheus spec

Comment on lines +167 to +168
var stackbuf [unescapeStackBufSize]byte // stack-allocated array for allocation-free unescaping of small strings
bU, err := jsonparser.Unescape(b, stackbuf[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. Do you know how much overhead Unescape is?

I'm wondering if we should go the other way around and use the escape raw string. This makes it a little bit more complicated later on when the labels are matched or read out. Overall I'm wondering if labels should be lazy. I've talked to Ed about this. If we can somehow can ensure the original line is not garbage collected until the process is done it would be enough to keep an index or rather unsafe pointer into the line. No allocation is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the cpu is there, not sure if we can change this without risk. However I do like the lazy idea give it a go. What would be great is to limit extraction but I wasn't able to because the sanitize function is irreversible.

@MasslessParticle
Copy link
Contributor

@cyriltovena @jeschkies Is this ready to be merged?

@jeschkies
Copy link
Contributor

@MasslessParticle not really. I wanted to try something else first.

@MasslessParticle
Copy link
Contributor

@jeschkies Should we convert this to a draft PR, then?

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clean, through and through. A pleasure to read, sorry it took me so long.

@jeschkies happy to see what you come up with, but I don't think it should block this PR

@MasslessParticle MasslessParticle merged commit 2861c00 into grafana:main Jan 12, 2023
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome !!

@vlad-diachenko
Copy link
Contributor

@cyriltovena should we mention such changes in the changelog? I believe that it might worth to mention because it improves json parser performance a lot ... ))

dannykopping pushed a commit to dannykopping/loki that referenced this pull request Jan 13, 2023
Regression introduced by grafana#7723 which was merged around the same time

Signed-off-by: Danny Kopping <[email protected]>
dannykopping pushed a commit that referenced this pull request Jan 13, 2023
Regression introduced by #7723 which was merged around the same time

Signed-off-by: Danny Kopping <[email protected]>
@cyriltovena
Copy link
Contributor Author

@cyriltovena should we mention such changes in the changelog? I believe that it might worth to mention because it improves json parser performance a lot ... ))

Sure however it would be better to try it in a cluster see the real difference. May be you can compare before and after ? I'd love to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants