Reduce the garbage produced by GraphQL while handling the request #33676

franz1981 · 2023-05-29T14:14:14Z

I've created a benchmark at https://github.com/franz1981/java-puzzles/blob/4a080cb08ada854a44877748f25a434e91b0b3a2/src/main/java/red/hat/puzzles/http/AcceptCharsetParseBenchmark.java can be run isolation.

The results using JDK 17.0.6, OpenJDK 64-Bit Server VM, 17.0.6+10 are:

Benchmark                                                                                  Mode  Cnt          Score          Error   Units
AcceptCharsetParseBenchmark.parseCharsetsNoSplit                                          thrpt   10     739545.185 ±    18019.748   ops/s
AcceptCharsetParseBenchmark.parseCharsetsNoSplit:·gc.alloc.rate.norm                      thrpt   10       1344.115 ±        0.023    B/op
AcceptCharsetParseBenchmark.parseCharsetsNoSplitNoCharsetNoSemicolon                      thrpt   10  134652732.274 ± 32894368.169   ops/s
AcceptCharsetParseBenchmark.parseCharsetsNoSplitNoCharsetNoSemicolon:·gc.alloc.rate.norm  thrpt   10         ≈ 10⁻⁵                   B/op
AcceptCharsetParseBenchmark.parseCharsetsSplit                                            thrpt   10     111006.612 ±     4177.994   ops/s
AcceptCharsetParseBenchmark.parseCharsetsSplit:·gc.alloc.rate.norm                        thrpt   10      15585.345 ±        0.166    B/op
AcceptCharsetParseBenchmark.parseCharsetsSplitNoCharsetNoSemicolon                        thrpt   10  160658450.869 ± 18929886.627   ops/s
AcceptCharsetParseBenchmark.parseCharsetsSplitNoCharsetNoSemicolon:·gc.alloc.rate.norm    thrpt   10         ≈ 10⁻⁶                   B/op

The regression in the *NoSplitNoCharsetNoSemicolon case is due to a missing check for ; presence upfront, but the overall new performance justify it (ie it's a trade-off).

The original issue has was due to the weird performance found at FgForrest/HttpServerEvaluationTest#1

And, by moving from HTTP 2 to 1.1, the allocation flamegraph report the String::split on getCharset as an (average) hot path, which this PR aim to fix.
It's not a game changer, and, in order to make it faster, I'm not anymore checking for semicolon presence nor OWS presence before the charset= searched, assuming a validator has already enforced it upfront.

franz1981 · 2023-05-29T14:20:36Z

@jmartisk AS said, this is not a game changer, but it's implemented in a way that "common" charsets could be cached with ease eg utf-8, us-ascii

Just want to be sure that my assumptions on the pre-validation hold

gsmet · 2023-05-29T15:00:02Z

Is it really a hot path when actually executing a query? Because if it's not a game changer, I'm not sure about the added complexity?

franz1981 · 2023-05-29T16:44:22Z

Is it really a hot path when actually executing a query?

Yep, it is hot, in the sense that is always hit, causing allocations for no reasons.
It's not a game changer, because there is another hot path, much more costly (and complex to solve) on Arc. I will send a fix for that one as well.
The code complexity can be improved by changing what it does and still avoiding String::split in such hot path.
Re the current level of improvement, please observe the GC norm statistics: this version of the code can be 15 times cheaper on average (despite being faster) memory wise for the heap: although it is not the dominant factor, it means much better usage of the Java heap memory (especially good for native image).

gsmet · 2023-05-29T18:24:56Z

Have you tried either compiling the pattern or using a StringTokenizer? Because that would be far more readable if it’s good enough.

franz1981 · 2023-05-29T18:37:45Z

Have you tried either compiling the pattern or using a StringTokenizer

I will happily do it and will use the benchmark to check how it compare; do you want me to turn the pr in draft in the meantime?

gsmet · 2023-05-30T08:00:40Z

do you want me to turn the pr in draft in the meantime?

I did it. String.split is really inefficient given you end up compiling the pattern again and again so I'm pretty sure compiling the Patterns will help. As for the StringTokenizer, it might even be better but who knows :).

franz1981 · 2023-05-30T08:05:49Z

String.split is really inefficient given you end up compiling the pattern again and again so I'm pretty sure compiling the Patterns will help.

Not for single char strings use cases: there's an optimization to avoid creating any pattern, see:
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/lang/String.java#L2283

That's implementing something similar to what I do, but I cannot use it for longer regexp that's why I have rolled my own (more compliant to the spec, including upper cases Charset too) - and avoid the ArrayList creation visible in the JDK version of the same optimized code (that's not much more readable, but hidden in the JDK layers!)

Re StringTokenizer, it won't work the same because of this:
https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/StringTokenizer.java#L352

Starting from recentish (I'm getting old really!) versions of JDK substring perform an actual copy of a region of the original String, meaning that we would create as much garbage as any of the parts we are going to extract for parsing (to find out the Charset), while in the version here I'm not creating any garbage but the last found charset - and we have the option to pool it too, if necessary.

In summary, the tokenizer:

if the token is ;: it create too many Strings
if the token is harset=, won't simplify the code in any form, really) and risk to create some additional Strings again (but it's a minor, given that usually when harset= is found, we rarely discard what's found!)
if the token is charset= it requires to use an additional allocation to perform another O(n) for Charset=

In short, to make it "right" doesn't seem again the right tool for the job

franz1981 · 2023-05-30T16:44:44Z

@gsmet An additional commit to improve the code quality (and performance too):

https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/http/AcceptCharsetParseBenchmark.java#L198 it's the method using StringTokenizer, feel free to send a PR with other variants you want to try.

Reporting the code here for simplicity

    private static String getCharsetTokenizer(String mimeType) {
        if (mimeType != null) {
            var charsets = new StringTokenizer(mimeType, ";", false);
            while (charsets.hasMoreTokens()) {
                final String charset = charsets.nextToken().trim();
                if (charset.startsWith("charset=")) {
                    return charset.substring(8);
                }
            }
        }
        return StandardCharsets.UTF_8.name();
    }

The reason why is slower, reported below, is because it doesn't use any intrinsic of the JVM but perform search one char at time, while the current PR is making use of some JVM optimized method to search for the charset string.

The results are these (for the same commit) and e1cd392 as well:

Benchmark                                                                                    Mode  Cnt          Score         Error   Units
AcceptCharsetParseBenchmark.parseCharsetsNoSplit                                            thrpt   10     952434.710 ±   21222.183   ops/s
AcceptCharsetParseBenchmark.parseCharsetsNoSplit:·gc.alloc.rate.norm                        thrpt   10         ≈ 10⁻³                  B/op
AcceptCharsetParseBenchmark.parseCharsetsNoSplitNoCharsetNoSemicolon                        thrpt   10  170452304.981 ± 2858513.808   ops/s
AcceptCharsetParseBenchmark.parseCharsetsNoSplitNoCharsetNoSemicolon:·gc.alloc.rate.norm    thrpt   10         ≈ 10⁻⁶                  B/op
AcceptCharsetParseBenchmark.parseCharsetsSplit                                              thrpt   10     108427.065 ±   11989.058   ops/s
AcceptCharsetParseBenchmark.parseCharsetsSplit:·gc.alloc.rate.norm                          thrpt   10      14073.218 ±       0.215    B/op
AcceptCharsetParseBenchmark.parseCharsetsSplitNoCharsetNoSemicolon                          thrpt   10  170762816.388 ±  949852.366   ops/s
AcceptCharsetParseBenchmark.parseCharsetsSplitNoCharsetNoSemicolon:·gc.alloc.rate.norm      thrpt   10         ≈ 10⁻⁶                  B/op
AcceptCharsetParseBenchmark.parseCharsetsTokenizer                                          thrpt   10     139832.649 ±    2679.046   ops/s
AcceptCharsetParseBenchmark.parseCharsetsTokenizer:·gc.alloc.rate.norm                      thrpt   10      11168.963 ±       0.056    B/op
AcceptCharsetParseBenchmark.parseCharsetsTokenizerNoCharsetNoSemicolon                      thrpt   10   34951788.207 ± 1482975.472   ops/s
AcceptCharsetParseBenchmark.parseCharsetsTokenizerNoCharsetNoSemicolon:·gc.alloc.rate.norm  thrpt   10         ≈ 10⁻⁵                  B/op

In short, the new version is 10X faster and can be (we need to decide which charsets to look-up) 0 garbage.
The StringTokenizer version is slightly better in term of speed then the existing code (but still slower then this PR) but they allocate similarly.

franz1981 · 2023-06-13T07:43:57Z

thanks to the latest changes mentioned in #33693, this PR now is a bit more relevant, because there's not anymore a single big bottleneck that's dominant if compared to this one.

...ntime/src/main/java/io/quarkus/smallrye/graphql/runtime/SmallRyeGraphQLExecutionHandler.java

franz1981 · 2023-06-27T08:39:22Z

@jmartisk @gsmet Let me know if the code is now readable enough and clear in what it does 👍

quarkus-bot · 2023-06-27T12:05:28Z

✔️ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

quarkus-bot bot added area/graphql area/smallrye labels May 29, 2023

gsmet marked this pull request as draft May 30, 2023 07:59

franz1981 marked this pull request as ready for review June 13, 2023 07:42

franz1981 commented Jun 13, 2023

View reviewed changes

...ntime/src/main/java/io/quarkus/smallrye/graphql/runtime/SmallRyeGraphQLExecutionHandler.java Show resolved Hide resolved

franz1981 force-pushed the get_charset_less_garbage branch from 7be2371 to f3e89f8 Compare June 27, 2023 08:21

franz1981 force-pushed the get_charset_less_garbage branch from f3e89f8 to e52f492 Compare June 27, 2023 08:40

Reduce the garbage produced by GraphQL while handling the request

eb3ae88

franz1981 force-pushed the get_charset_less_garbage branch from e52f492 to eb3ae88 Compare June 27, 2023 09:32

jmartisk approved these changes Jun 28, 2023

View reviewed changes

geoand merged commit 1eb8437 into quarkusio:main Jun 28, 2023

quarkus-bot bot added this to the 3.3 - main milestone Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the garbage produced by GraphQL while handling the request #33676

Reduce the garbage produced by GraphQL while handling the request #33676

franz1981 commented May 29, 2023

franz1981 commented May 29, 2023

gsmet commented May 29, 2023

franz1981 commented May 29, 2023 •

edited

Loading

gsmet commented May 29, 2023

franz1981 commented May 29, 2023

gsmet commented May 30, 2023

franz1981 commented May 30, 2023 •

edited

Loading

franz1981 commented May 30, 2023

franz1981 commented Jun 13, 2023

franz1981 commented Jun 27, 2023

quarkus-bot bot commented Jun 27, 2023

Reduce the garbage produced by GraphQL while handling the request #33676

Reduce the garbage produced by GraphQL while handling the request #33676

Conversation

franz1981 commented May 29, 2023

franz1981 commented May 29, 2023

gsmet commented May 29, 2023

franz1981 commented May 29, 2023 • edited Loading

gsmet commented May 29, 2023

franz1981 commented May 29, 2023

gsmet commented May 30, 2023

franz1981 commented May 30, 2023 • edited Loading

franz1981 commented May 30, 2023

franz1981 commented Jun 13, 2023

franz1981 commented Jun 27, 2023

quarkus-bot bot commented Jun 27, 2023

franz1981 commented May 29, 2023 •

edited

Loading

franz1981 commented May 30, 2023 •

edited

Loading