-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance fixes based on hotspot analysis in load tests #11148
Conversation
Sync from master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks interesting. Could you rebase? There is a first commit that looks weird.
@@ -116,6 +116,10 @@ public boolean equals(Object obj) { | |||
return false; | |||
} | |||
Key other = (Key) obj; | |||
// Shortcut removes hotspot on contextual.equals | |||
if (contextual == other.contextual) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, at the moment we don't implement equals()/hashCode()
for generated bean classes so !contextual.equals(other.contextual)
should be translated to !(contextual == other.contextual)
. In other words, this modification could save one java.lang.Object.equals(Object)
invocation (which is very likely negligible although it removes the equals method from the hot path). Long.toHexString()
saves probably a lot more because String.format()
creates a formatter object, parses the string, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkouba what's you're saying is we should drop the equals() call altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I'd like to keep it. The fact that we don't implement it now does not mean we'll never need to implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But if it ended up being a hot spot for Paul, there's something weird going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say without the app and test sources. CC @bcluap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I just saw that the profiler used is the one from VisualVM, it inject some bytecode and can generates deoptimization.
That's why it can misleading you to such kind of hotspot.
When working on lowlevel optimizations (and avoiding a method call is very low level), you should use low level stuff like async-profiler, we have a page that expalain how to use it: https://github.com/quarkusio/quarkus/blob/master/TROUBLESHOOTING.md
So again, the hotspot for the equals method call is certainly a profiler artefact not a real hotspot.
And 1% difference between two load testing run is very slight so can go into the recording error range.
Such small performance enhancements should be validated via microbenchmarking using JMH ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such small performance enhancements should be validated via microbenchmarking using JMH ...
Yes, we talked about this in the comments below, e.g. #11148 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have many possible concrete types implementing Contextual
, the dispatch to find the right equals
implementation becomes a very expensive megamorphic call.
+1 for the shortcut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above we removed the need for equals()
from the hot path completely...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above we removed the need for
equals()
from the hot path completely...
Sure that's even better. I just meant to suggest a possible explanation for the equals to be - in some contexts - really not that efficient. Removing the field is even better!
Yea that .equals one was a bit weird but it popped up as the hottest method
under io.quarkus in my tests. It was being called 2.5m times per second...
The change resulted in it coming off the hot list.
…On Mon, 03 Aug 2020, 12:15 Martin Kouba, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
independent-projects/arc/runtime/src/main/java/io/quarkus/arc/impl/AbstractSharedContext.java
<#11148 (comment)>:
> @@ -116,6 +116,10 @@ public boolean equals(Object obj) {
return false;
}
Key other = (Key) obj;
+ // Shortcut removes hotspot on contextual.equals
+ if (contextual == other.contextual) {
Hm, at the moment we don't implement equals()/hashCode() for generated
bean classes so !contextual.equals(other.contextual) should be translated
to !(contextual == other.contextual). In other words, this modification
could save one java.lang.Object.equals(Object) invocation (which is very
likely negligible although it removes the equals method from the hot path).
Long.toHexString() saves probably a lot more because String.format()
creates a formatter object, parses the string, etc.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11148 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGMB23PTJGHQPG4ZCJZUJZDR62E3BANCNFSM4PSWSDHA>
.
|
Sure, because the method is not invoked anymore. It's replaced with the |
Hmmmm. Is it normal that this method is called so many times? |
It depends. It's a load test anyway. But every time you invoke a method upon a client proxy we have to do a lookup, and in this case create a new key and invoke |
This was the analysis I found. This was run now without the change:
[image: image.png]
And with the change:
[image: image.png]
I can't understand why it makes a big enough impact to
take AbstractSharedContext.$Key.equals off the list as it is still called
and is just one hop away from the beans default .equals but it does make a
difference - both in my test throughout and the hotspot analysis.
|
I can't see the attached images... |
The reason the .equals is called so much is due to traces like this: Every method call to a managed bean (app scoped or request scoped) results in a lookup in the concurrenthashmap to find the correct delegate. We have a large number of helpers in our architecture which use other helpers and these are all ApplicationScoped so they can be easily injected when needed. As a call jumps around a bit and uses persistence and other things these quickly add up and can end up using say 200 managed beans in a complex call. I've subsequently changed them all to Singletons to avoid the proxying but it's probably still worth trying to optimise the resolution of beans as much as possible. |
I think that we need more information. These two snapshots don't seem to be comparable. The total time of the first one is 79 732 ms and the total time of the second one is 114 359 ms. Also after the change, the However, I really want to find the problem. So I'm going to update the ClientProxyInvocationBenchmark microbenchmark to see how your modification improves the throughput. Maybe we could find even more optimizations. In general, we could skip the lookup for |
Maybe we can limit the PR to the change in the Jaeger extension and get this part merged and backported? Then you can pursue your discussions on the rest? |
+1
I wonder if it could be a problem of hash collisions and a large number of application scoped beans (so that |
I did a simple test for that and found very few occasions where the method
returned false. So I don't think its that.
I did a micro benchmark of sorts in the running load test and using
System.nanotime timings around the body of the function. With the change
commented out, the average of 1000 random results was 55ns. With it present
it was 26ns. So if we doing say 200 bean calls to 1 business method then
that's about 0.0058ms additional. In my test, each business method has a
latency of 0.27ms on average ... so 0.0058 is 2% of 0.27ms... which aligns
with my findings.
long start = System.nanoTime();
try {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof Key)) {
return false;
}
Key other = (Key) obj;
// if (contextual == other.contextual) {
// return true;
// }
if (!contextual.equals(other.contextual)) {
return false;
}
return true;
} finally {
long latency = System.nanoTime() - start;
if (System.currentTimeMillis() % 100 == 0) {
System.out.println(latency);
}
}
Paul Carter-Brown
…On Mon, Aug 3, 2020 at 5:03 PM Martin Kouba ***@***.***> wrote:
Maybe we can limit the PR to the change in the Jaeger extension and get
this part merged and backported?
+1
Then you can pursue your discussions on the rest?
I wonder if it could be a problem of hash collisions and a large number of
application scoped beans (so that equals() is called many times to find
the correct bean instance). I'll prepare a branch to test this theory...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGMB23PHYCSB4QKZYUUSDNLR63GT5ANCNFSM4PSWSDHA>
.
|
Hm, nanosecond improvements are often hard to prove unless you use a proper microbenchmark tool such as jmh properly and interpret the results correctly (which is not trivial ;-). That said, I don't question your results I just wanted to note that microbenchmarks are tricky. In any case, I've prepared a branch where we get rid of the ClientProxyInvocationBenchmark shows 2-7% improvement compared to v1.6.1. (note that we talk about ~ 100 millions invocations per second and app context with ~ 200 beans). @bcluap It would be great if you could give it a try with your load test... Could you also modify this PR to only include the Jaeger change? Thanks! |
Yes, please let's get this PR only about the Jaeger change (and remove the other commit too). That way I can backport that one right away to 1.7. |
I created a new PR for Jaeger: #11197 |
@bcluap Thanks for the PR.
That's funny. I'm getting curious about what exactly does your load test do ;-). You're talking about Also what kind of tool/profiler are you using?
That kind of makes sense because it's called many many times... |
Hi Martin,
I'm actually busy creating a simple test case which can be called with ab
so you can see what I see.
Paul
…On Tue, Aug 4, 2020 at 4:32 PM Martin Kouba ***@***.***> wrote:
@bcluap <https://github.com/bcluap> Thanks for the PR.
Unfortunately the changes slowed things down...
That's funny. I'm getting curious about what exactly does your load test
do ;-). You're talking about tps? Does it mean "transactions per second".
If so what kind of transactions are involved?
Also what kind of tool/profiler are you using?
Now there is a hotspot on the get itself...
That kind of makes sense because it's called many many times...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGMB23LTEA3MGQBFD24FHA3R7ALWFANCNFSM4PSWSDHA>
.
|
I'm battling to explain what I'm seeing. I'll answer your questions but
here is a test project which forces a number of bean calls per rest request:
https://github.com/bcluap/quarkus-examples
Run the project in there and hit it with: ab -k -c100 -t10 -n100000000
http://localhost:8080/test1/go\?beanCalls\=200
Repeat a number of times and get an average. Try with different
implementations of the AbstractSharedContext and see.
For this project, Martins change has about a 10% increase in throughput
(requests per second) as compared to the default in master.
On my large test project, however, Martins is very slightly slower. Weird.
Paul
…On Tue, Aug 4, 2020 at 6:35 PM Loïc Mathieu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
independent-projects/arc/runtime/src/main/java/io/quarkus/arc/impl/AbstractSharedContext.java
<#11148 (comment)>:
> @@ -116,6 +116,10 @@ public boolean equals(Object obj) {
return false;
}
Key other = (Key) obj;
+ // Shortcut removes hotspot on contextual.equals
+ if (contextual == other.contextual) {
In JVM mode, JIT inlining will make the code path the same in no more than
1000 invocations (short method threshold) so this can not be an hotspot.
In native mode, I don't know if there is any inlining.
Anyway, method call is very chip so it may be a profiling artefact.
Some profilers inject bytecode so inlining didn't happens and we see these
kind of wrong hotspot.
This kind of hack can go wrong at some moment so I suggest dropping it.
Side question, which profiler was used ? Asynch-profiler didn't inject
bytecode so it may not show this line as an hotspot.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGMB23MMEG4J5L2HDSANKITR7A2FRANCNFSM4PSWSDHA>
.
|
@bcluap That's hard to guess. We would have to analyze your application to see which parts are involved. I suppose it's not possible to share your test project, right? |
@bcluap please use async-profiler if possible, or Java Mission Control. |
Thanks for the feedback. I did some JMH profiling last night and can
confirm that Martin's changes do improve the Bean lookups quite a lot. I
also did some more load tests and can see that we are working within the
margin of error of the tests so its very difficult to ascertain that they
do actually slow my test down as the results are not perfectly consistent.
You are spot on the VisualVM is not ideal for micro optimisations but as a
quick first pass it does pick up a lot of issues and at first, I thought
that maybe the beans had a more complex .equals implementation and hence
the change I suggested would have made perfect sense. But seeing that the
change actually just prevents one small function call then it will likely
be inlined as you say.
Paul
…On Wed, Aug 5, 2020 at 10:39 AM Loïc Mathieu ***@***.***> wrote:
@bcluap <https://github.com/bcluap> please use async-profiler if
possible, or Java Mission Control.
You can follow our guide here:
https://github.com/quarkusio/quarkus/blob/master/TROUBLESHOOTING.md
Other profilers have sampling bias that can mess with low level
optimizations.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11148 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGMB23O36YZE7FNZTLFTRO3R7ELD5ANCNFSM4PSWSDHA>
.
|
I'm closing this one. We can iterate in subsequent PRs if we find room for other optimizations. |
These 2 changes result from hotspot analysis of a load test using Jaeger extension and lots of Application scoped beans. The changes result in both the .equals and String.format falling off the hotspot list and improve throughput by more than 1%.