3.2.4.Final is slower compared to 2.15.1.Final #35392

Karthikrao121 · 2023-08-17T07:18:33Z

Description

We recently migrated to the latest version of quarkus. After the upgrade to the latest version we are seeing slower startup times compared to the 2.15.1.Final version.

native (powered by Quarkus 2.15.1.Final) started in 0.308s.
native (powered by Quarkus 3.2.4.Final) started in 0.423s.

Need migration guide for performance enhancements on 3.2.4.Final.

Implementation ideas

No response

gsmet · 2023-08-17T07:44:33Z

Need migration guide for performance enhancements on 3.2.4.Final.

I think we would first need a lot more data :).

The list of extensions would be a good start. For instance, we know that Hibernate ORM 6 comes with more classes that need to be initialized and causes a slow down.

Then I would continue by checking what's happening in JVM mode and if it is significantly slower as well, you could have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md to profile it.

Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.

zakkak · 2023-08-17T08:58:11Z

Then we might want to actually profile the native executable but I'm not sure how to do that. @zakkak would know if it's possible.

That's definitely possible, but as you noted it should be the last step :)

You can see some examples in https://quarkus.io/guides/native-reference#inspecting-and-debugging

Furthermore, @galderz and @roberttoyonaga recently wrote a couple of great guides related to profiling native executables:

Karthikrao121 · 2023-08-17T13:20:26Z

These are from 2.15.1.Final
Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, smallrye-openapi, vertx]"
These are from 3.2.4.Final
"Installed features: [amazon-dynamodb, amazon-lambda, amazon-s3, amazon-secretsmanager, amazon-ssm, cdi, logging-json, rest-client-reactive, rest-client-reactive-jackson, resteasy-reactive, resteasy-reactive-jackson, security, smallrye-context-propagation, vertx]"

maxandersen · 2023-08-28T07:19:41Z

@Karthikrao121 have you tried some of suggestions above to trouble shoot where changes in performance occur? are you only seeing startup changes or also changes in memory and throughput runtime performance?

Karthikrao121 · 2023-08-28T14:40:08Z

@maxandersen memory consumption also increased on 3.2.4.Final.
on 2.15.Final Memory Size: 128 MB Max Memory Used: 116 MB startup times are around 400 ms to 500 ms
on 3.2.4.Final Memory Size: 256 MB Max Memory Used: 154 MB even with memory increase startup times are consistently over 600ms

gsmet · 2023-08-28T14:45:25Z

@Karthikrao121

For the startup time, have a look at https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md , you should be able to get a CPU profile that would be a good start. Get one for both versions so that we can compare.

For the memory, it would be helpful if you could get a heap dump.

There's a good chance you won't want to make these public but if you're comfortable sharing them with me (and they don't contain sensitive information), you can send me an email with a download link at gsmet at redhat dot com.

The other option is that you have a look yourself and share your insights.

Thanks.

Karthikrao121 · 2023-08-29T14:16:51Z

@gsmet please provide your insights on the shared CPU profile details.

geoand · 2023-08-30T05:14:55Z

@franz1981 do we have any resources to help @Karthikrao121 answer the question above?
I can obviously point to the AsyncProfiler repo, but I am wondering if we have any of our own resources about using it :)

franz1981 · 2023-08-30T06:08:42Z

Sadly we don't have any resources re quarkus for async profiler, because it's a kind of "moving target".

Before using async profiler we need to understand first if they can observe the same behaviour in JVM mode.
Regardless, If they are running in a container or not, and how many resources they dedicate to it. The latter part is very important, because an increased allocation pressure can pass nearly unnoticed with enough available memory (GCs works like that); in short: if they have lessen constrains (cpu/mem) things get better? If yes, which resource impact the most?
That would narrow If is just a memory related problem (as it seems).

As suggest by @gsmet If they could collect profiling data and/or use the method from @galderz to check if more (native) allocations are happening and where, on native mode, it would be great/perfect. If they can provide collapsed/folded profiling data of the 2 versions, flamegraphs allow to easily create diff flamegraphs for comparison, which are pretty handy (see https://www.brendangregg.com/blog/2014-11-09/differential-flame-graphs.html).

If they cannot troubleshoot in native we have 2 other ways:

the same behaviour is reproducible with JVM mode (in an unconstrained env, CPU and memory-wise)? If yes, and it's related allocation, they can use Startup RSS regression introduced by #35246 #35406 (comment) to provide data about allocations happening at startup in the 2 cases. It would help to trivially find and fix this, given that the difference in memory used was noticiable
Occam rule: exclude the easier things to check between the versions which can impact startup time. Context propagation should have changed a lot; they really need it? What happen if they tune it to reduce its overhead? @mkouba told me about a property to do it (maybe @geoand remember as well). It's just an example, I don't know if ctx prop can impact startup time, but I hope it's clear what I meant

Karthikrao121 · 2023-08-30T13:33:33Z

I did provide the async cpu profiles from both versions to @gsmet. what we observed is if we increase the memory and cpu the start up times for 3.2.4.Final are as same as 2.15.1.Final. But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.

franz1981 · 2023-08-30T14:06:39Z

Which seems to suggest:

either the live set is increased and the available one to allocate young gen obj is reduced; which means that less memory will make more work to be moved into the slower major/mixed pauses (JVM mode eh), and more moving/compacting activity from the GC perspective (which, with less CPU resources, code compilation will kick in slower, and application code will pause more frequently)
we allocate more volatile/temporary objects and having less memory would make more obj to prematurely become tenured (because surviving more, If their life cycles is short both not short enough): the effects on the CPU resources are similar to the previous point, given that GC will behave the same

I will stop my wild guesses and waiting the @gsmet analysis

Thanks for sharing the data with him already

galderz · 2023-08-31T04:14:27Z

But 2.15.1.Final is doing faster start up times at 128 MB where as 3.2.4.Final is doing the same start up times at around 700 MB to 1024 MB.

What are these memory sizes you mention? -Xmx values? Container memory size limits?

galderz · 2023-08-31T04:20:10Z

Also, can you post here what you sent to @gsmet?

Karthikrao121 · 2023-08-31T04:20:40Z

These are memory allocations on AWS lambda.

gsmet · 2023-09-04T11:45:58Z

@Karthikrao121 sorry, I just found your email in my spam folder after my PTO. I think what you sent is dev mode profiling which makes it hard to pinpoint potential slow downs in your app itself.
Also given you found the regression with 3.2.4.Final, could you stick to this version? We have a known regression in 3.3.0 so that will make it harder to track what you noticed in 3.2.4 already.

What we would need is this one: https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md#profiling-application-startup-with-async-profiler , profiling startup by running the application jar.

Send me a flamegraph with 2.x and the same app with 3.2.4.Final.

Thanks!

franz1981 · 2023-09-04T12:01:45Z

@gsmet still on PTO but to save troubles due to different architecture they can use https://github.com/jvm-profiling-tools/ap-loader which pack different binaries in.
In addition I suggest to produce jfr output which allow to extract much more info via jfr2flame conversion eg --classify, --lines and others...
@gsmet we can have a quick call in 2 days when I am back too

gsmet · 2023-09-04T12:13:24Z

Let's start simple first. And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.

geoand · 2023-09-04T12:14:05Z

And if we need additional instructions in TROUBLESHOOTING.md, we should work on it and make them as easy to follow as possible.

+1

franz1981 · 2023-09-04T12:16:17Z

Yeah, my point is to save users to capture data over and over: If they capture the data in jfr format we can extract different types of flamegraphs ourselves, without requiring the users to collect more...

franz1981 · 2023-09-24T08:35:00Z

@Karthikrao121 any update?

@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.

Additionally, in our https://quarkus.io/guides/performance-measure we explained that startup time is for us what makes the application available to users, which imply awaiting the first request completion. This, compared to what quarkus print as startup time, include few more class loading and initializations which can affect the final numbers: did you perform the same measurements in your startup time tests?
I am saying that, because this is likely the measure you really are interested in.

radcortez · 2023-09-25T09:26:39Z

@radcortez worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet knows, and If you will give it a shot this can give us an additional (indipendent) feedback.

Yes, we have made a bunch of improvements to both start time and RSS in both JVM and Native mode. A few of these improvements are already available in 3.4.x, but the big chunk should only be available in 3.5.x.

Karthikrao121 · 2023-09-29T14:22:33Z

Gsmet, The startup time decreased with 3.4.1 version around 100 ms . Sorry for the late reply. We have seen some improvements with the newer version.

…

On Mon, Sep 25, 2023, 04:26 Roberto Cortez ***@***.***> wrote: @radcortez <https://github.com/radcortez> worked hard on improving few aspects re performance config which had a tremendous impact on both native and JVM mode startup time; I'm not aware which specific versions you can try and check if deliver the expected level of performance, maybe Roberto or @gsmet <https://github.com/gsmet> knows, and If you will give it a shot this can give us an additional (indipendent) feedback. Yes, we have made a bunch of improvements to both start time and RSS in both JVM and Native mode. A few of these improvements are already available in 3.4.x, but the big chunk should only be available in 3.5.x. — Reply to this email directly, view it on GitHub <#35392 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACXWFT54OLZALHKER4Z3GPTX4FE5ZANCNFSM6AAAAAA3TT4OFI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

radcortez · 2023-09-29T14:41:15Z

Gsmet, The startup time decreased with 3.4.1 version around 100 ms . Sorry for the late reply. We have seen some improvements with the newer version.
…

Good to hear; thanks for lettings us know.

Some additional improvements are coming in #35322

radcortez · 2024-03-01T23:53:14Z

I believe this can be closed due to the improvements done in:

Karthikrao121 added the kind/enhancement New feature or request label Aug 17, 2023

quarkus-bot bot added the triage/needs-triage label Aug 17, 2023

geoand added kind/bug Something isn't working triage/needs-feedback We are waiting for feedback. and removed kind/enhancement New feature or request triage/needs-triage labels Aug 21, 2023

geoand removed the triage/needs-feedback We are waiting for feedback. label Sep 20, 2023

radcortez closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.2.4.Final is slower compared to 2.15.1.Final #35392

3.2.4.Final is slower compared to 2.15.1.Final #35392

Karthikrao121 commented Aug 17, 2023

gsmet commented Aug 17, 2023

zakkak commented Aug 17, 2023

Karthikrao121 commented Aug 17, 2023

maxandersen commented Aug 28, 2023

Karthikrao121 commented Aug 28, 2023 •

edited

Loading

gsmet commented Aug 28, 2023

Karthikrao121 commented Aug 29, 2023

geoand commented Aug 30, 2023 •

edited

Loading

franz1981 commented Aug 30, 2023 •

edited

Loading

Karthikrao121 commented Aug 30, 2023

franz1981 commented Aug 30, 2023 •

edited

Loading

galderz commented Aug 31, 2023

galderz commented Aug 31, 2023

Karthikrao121 commented Aug 31, 2023

gsmet commented Sep 4, 2023

franz1981 commented Sep 4, 2023 •

edited

Loading

gsmet commented Sep 4, 2023

geoand commented Sep 4, 2023

franz1981 commented Sep 4, 2023

franz1981 commented Sep 24, 2023 •

edited

Loading

radcortez commented Sep 25, 2023

Karthikrao121 commented Sep 29, 2023 via email

radcortez commented Sep 29, 2023

radcortez commented Mar 1, 2024

3.2.4.Final is slower compared to 2.15.1.Final #35392

3.2.4.Final is slower compared to 2.15.1.Final #35392

Comments

Karthikrao121 commented Aug 17, 2023

Description

Implementation ideas

gsmet commented Aug 17, 2023

zakkak commented Aug 17, 2023

Karthikrao121 commented Aug 17, 2023

maxandersen commented Aug 28, 2023

Karthikrao121 commented Aug 28, 2023 • edited Loading

gsmet commented Aug 28, 2023

Karthikrao121 commented Aug 29, 2023

geoand commented Aug 30, 2023 • edited Loading

franz1981 commented Aug 30, 2023 • edited Loading

Karthikrao121 commented Aug 30, 2023

franz1981 commented Aug 30, 2023 • edited Loading

galderz commented Aug 31, 2023

galderz commented Aug 31, 2023

Karthikrao121 commented Aug 31, 2023

gsmet commented Sep 4, 2023

franz1981 commented Sep 4, 2023 • edited Loading

gsmet commented Sep 4, 2023

geoand commented Sep 4, 2023

franz1981 commented Sep 4, 2023

franz1981 commented Sep 24, 2023 • edited Loading

radcortez commented Sep 25, 2023

Karthikrao121 commented Sep 29, 2023 via email

radcortez commented Sep 29, 2023

radcortez commented Mar 1, 2024

Karthikrao121 commented Aug 28, 2023 •

edited

Loading

geoand commented Aug 30, 2023 •

edited

Loading

franz1981 commented Aug 30, 2023 •

edited

Loading

franz1981 commented Aug 30, 2023 •

edited

Loading

franz1981 commented Sep 4, 2023 •

edited

Loading

franz1981 commented Sep 24, 2023 •

edited

Loading