-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the JSON parser performance. #7723
Improve the JSON parser performance. #7723
Conversation
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.1%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
// snapshot the current prefix position | ||
prefixLen := len(j.prefixBuffer) | ||
j.prefixBuffer = append(j.prefixBuffer, byte(jsonSpacer)) | ||
j.prefixBuffer = appendSanitized(j.prefixBuffer, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could sanitize and unescape at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only sanitize key but not values, this is to comply to the Prometheus spec
var stackbuf [unescapeStackBufSize]byte // stack-allocated array for allocation-free unescaping of small strings | ||
bU, err := jsonparser.Unescape(b, stackbuf[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. Do you know how much overhead Unescape
is?
I'm wondering if we should go the other way around and use the escape raw string. This makes it a little bit more complicated later on when the labels are matched or read out. Overall I'm wondering if labels should be lazy. I've talked to Ed about this. If we can somehow can ensure the original line is not garbage collected until the process is done it would be enough to keep an index or rather unsafe pointer into the line. No allocation is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the cpu is there, not sure if we can change this without risk. However I do like the lazy idea give it a go. What would be great is to limit extraction but I wasn't able to because the sanitize function is irreversible.
@cyriltovena @jeschkies Is this ready to be merged? |
@MasslessParticle not really. I wanted to try something else first. |
@jeschkies Should we convert this to a draft PR, then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clean, through and through. A pleasure to read, sorry it took me so long.
@jeschkies happy to see what you come up with, but I don't think it should block this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks awesome !!
@cyriltovena should we mention such changes in the changelog? I believe that it might worth to mention because it improves json parser performance a lot ... )) |
Regression introduced by grafana#7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>
Regression introduced by #7723 which was merged around the same time Signed-off-by: Danny Kopping <[email protected]>
Sure however it would be better to try it in a cluster see the real difference. May be you can compare before and after ? I'd love to know. |
Sorry I couldn't help myself and worked a bit on the JSON parser by using another library that allows way less allocations.
On top of this, the parser code is also much simpler to understand.
I realize that we could possibly also intern/cache (limited obviously) extracted JSON value per key. I would do it per key because otherwise, a high cardinality key can take over the intern/cache which will make it worthless. Happy to walk someone through the idea, I think this would be a big jump too.
Would be great to run and test this in our dev environment using some JSON queries before releasing it.