Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid ClassLoader identityHashCode and fix concurrency of ctx manager lookup table #443

Closed
wants to merge 2 commits into from

Conversation

franz1981
Copy link

It replaces #399

Quoting the @Sanne comments at #399 (comment)

we could optimize for very few ? e.g. I believe Quarkus in the most common production configuration has two - seems fit for a linear scan. I don't know about other runtimes using this library but I'd expect them to run also with very few classloaders; e.g. WildFly has many classloaders but I expect only a handful would be visible to this - theory that would need to be verified though.

If this approach is pushing too far the assumption about being very few ...

  1. I could consider adding property to let the application decide if uses this one, or the previous implementation (which I need to correctly pack in a separate class)
  2. I can make it super dynamic and expose instead, a sensible cutoff value (or none, by deciding ourselves for the users!), which can transform the array map into a proper one (likely need something thread-safe :/)

@Sanne
Copy link
Contributor

Sanne commented Apr 6, 2024

Looks great!

@franz1981
Copy link
Author

franz1981 commented Apr 7, 2024

@Sanne I've prepared a microbenchmark for this

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;

import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@Warmup(iterations = 10, time = 400, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 5, time = 200, timeUnit = TimeUnit.MILLISECONDS)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(2)
public class IndentityLookup {

    private SingleWriterCopyOnWriteArrayIdentityMap<ClassLoader, Object> arrayMap;
    private Map<ClassLoader, Object> hashMap;
    @Param({ "2", "10"})
    public int size;
    private ClassLoader missingClassLoader;
    private ClassLoader firstClassLoader;
    private ClassLoader lastClassLoader;

    @Setup
    public void init() {
        arrayMap = new SingleWriterCopyOnWriteArrayIdentityMap<>();
        hashMap = new HashMap<>();

        // create N different instances of ClassLoader
        missingClassLoader = new ClassLoader() { };

        for (int i = 0; i < size; i++) {
            var classLoader = new ClassLoader() { };
            if (i == 0) {
                firstClassLoader = classLoader;
            }
            if (i == size - 1) {
                lastClassLoader = classLoader;
            }
            arrayMap.put(classLoader, new Object());
            hashMap.put(classLoader, new Object());
        }
    }

    @Benchmark
    public Object arrayMapGetMissing() {
        return arrayMap.get(missingClassLoader);
    }

    @Benchmark
    public Object hashMapGetMissing() {
        return hashMap.get(missingClassLoader);
    }

    @Benchmark
    public Object arrayMapGetFirst() {
        return arrayMap.get(firstClassLoader);
    }

    @Benchmark
    public Object hashMapGetFirst() {
        return hashMap.get(firstClassLoader);
    }

    @Benchmark
    public Object arrayMapGetLast() {
        return arrayMap.get(lastClassLoader);
    }

    @Benchmark
    public Object hashMapGetLast() {
        return hashMap.get(lastClassLoader);
    }
}

These are the results

Benchmark                           (size)   Mode  Cnt     Score    Error   Units
IndentityLookup.arrayMapGetFirst         2  thrpt   10  1436.097 ± 33.765  ops/us
IndentityLookup.arrayMapGetFirst        10  thrpt   10  1440.117 ± 37.064  ops/us
IndentityLookup.arrayMapGetLast          2  thrpt   10   666.662 ±  3.000  ops/us
IndentityLookup.arrayMapGetLast         10  thrpt   10   332.156 ±  5.896  ops/us
IndentityLookup.arrayMapGetMissing       2  thrpt   10   640.518 ±  7.431  ops/us
IndentityLookup.arrayMapGetMissing      10  thrpt   10   301.235 ±  2.458  ops/us
IndentityLookup.hashMapGetFirst          2  thrpt   10   651.625 ±  5.688  ops/us
IndentityLookup.hashMapGetFirst         10  thrpt   10   647.957 ±  6.822  ops/us
IndentityLookup.hashMapGetLast           2  thrpt   10   652.398 ±  7.165  ops/us
IndentityLookup.hashMapGetLast          10  thrpt   10   652.113 ±  1.399  ops/us
IndentityLookup.hashMapGetMissing        2  thrpt   10   865.340 ±  8.298  ops/us
IndentityLookup.hashMapGetMissing       10  thrpt   10   870.814 ±  6.873  ops/us

The worst case scenario (if we ignore, the missing case, which shouldn't happen, right @FroMage ?) is
*GetLast and for such, with 2 class loaders, the performance vs HashMap is actually the same.
With 10 class loaders, instead, the array map is even slower (!!!).
I've yet to analyze the results, but as it is now I'm putting this in draft

@franz1981 franz1981 marked this pull request as draft April 7, 2024 16:36
@franz1981
Copy link
Author

franz1981 commented Apr 7, 2024

mmm I'm not very happy, still, because my benchmark seems to not trigger the same behavior as the real code

image

which is very different from the real one

image

that's hitting https://github.com/openjdk/jdk/blob/jdk-21%2B35/src/hotspot/share/prims/jvm.cpp#L611

Still investigating what's going on. -prof perfasm seems to agree that the hash code used is different: it seems related the HashMaps hash method which maybe have its type polluted; let's try differently...

In short, it could:

  • be the original problem a quirk of the profiler which was attached before the identity hash code is using the intensified version: I can easily verify it by removing the warmup od JMH and attach the profiler
  • be a real problem due to HashMap call chain till hash code which is not inlined AND has observed too many types there (but for the type profile pollution I would have expected a vtable_stub presence, which is not..)

@franz1981
Copy link
Author

franz1981 commented Apr 8, 2024

@Sanne This is FUN

look at the benchmark at:
https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/concurrent/IndentityLookup.java

these are the numbers, now fixed to match the actual behavior in the benchmark (and MAYBE is what you can observe in hibernate too (!!!!) - eg )

Benchmark                                   (disableHashCodeIntrinsics)  (size)   Mode  Cnt     Score    Error   Units
IndentityLookup.arrayMapGetFirst                                   true       2  thrpt   10  1453.559 ± 15.492  ops/us
IndentityLookup.arrayMapGetFirst                                   true      10  thrpt   10  1485.773 ± 18.988  ops/us
IndentityLookup.arrayMapGetLast                                    true       2  thrpt   10   687.806 ± 29.120  ops/us
IndentityLookup.arrayMapGetLast                                    true      10  thrpt   10   337.043 ±  5.881  ops/us
IndentityLookup.arrayMapGetMissing                                 true       2  thrpt   10   652.220 ±  7.062  ops/us
IndentityLookup.arrayMapGetMissing                                 true      10  thrpt   10   342.773 ±  2.803  ops/us
IndentityLookup.classLoaderHashCode                                true       2  thrpt   10   109.970 ± 12.456  ops/us
IndentityLookup.classLoaderHashCode                                true      10  thrpt   10   108.434 ± 14.299  ops/us
IndentityLookup.hashMapGetFirst                                    true       2  thrpt   10   104.966 ±  7.004  ops/us
IndentityLookup.hashMapGetFirst                                    true      10  thrpt   10   107.667 ±  0.808  ops/us
IndentityLookup.hashMapGetLast                                     true       2  thrpt   10   108.373 ±  0.914  ops/us
IndentityLookup.hashMapGetLast                                     true      10  thrpt   10   108.948 ±  1.086  ops/us
IndentityLookup.hashMapGetMissing                                  true       2  thrpt   10   110.337 ±  0.735  ops/us
IndentityLookup.hashMapGetMissing                                  true      10  thrpt   10   109.889 ±  1.061  ops/us

for example now correctly produce

Screenshot from 2024-04-08 14-01-23

which matches what we observed in the real code.

I can undraft, but now ...

why we have synchronized vs ClassLoader(s) instances in Quarkus???

because if we won't, there is no need of this collection...

@franz1981 franz1981 marked this pull request as ready for review April 8, 2024 12:02
@franz1981
Copy link
Author

franz1981 commented Apr 8, 2024

For hibernate too:
Screenshot from 2024-04-08 14-17-47

given that identity hash map piggyback the same intrinsics

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

why we have synchronized vs ClassLoader(s) instances in Quarkus???

I don't think this is needed in Quarkus, honestly. This is for WildFly and other application servers with one CL per application.

@franz1981
Copy link
Author

I don't understand your comment @FroMage : My point is that the performance issue I've discovered in Quarkus happen because some of the Classloader(s) stored in the map are used as objects to perform synchronization...and it seems weird to me.

@franz1981 franz1981 closed this Apr 8, 2024
@franz1981 franz1981 reopened this Apr 8, 2024
@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

I'm saying that supporting one set-up per CL was added for Application Servers, not for Quarkus. In the case of Quarkus, we have a single ClassLoader and a single SmallRyeContextManager, so this entire map is not even useful for Quarkus

@franz1981
Copy link
Author

franz1981 commented Apr 8, 2024

In the case of Quarkus, we have a single ClassLoader and a single SmallRyeContextManager, so this entire map is not even useful for Quarkus

I would say the opposite actually: look at the numbers, I can show what happen with a single one too.

This is with size = 1:

Benchmark                                   (disableHashCodeIntrinsics)  (size)   Mode  Cnt     Score    Error   Units
IndentityLookup.arrayMapGetFirst                                   true       1  thrpt   10  1474.012 ± 19.736  ops/us
IndentityLookup.arrayMapGetLast                                    true       1  thrpt   10  1477.383 ± 18.811  ops/us
IndentityLookup.arrayMapGetMissing                                 true       1  thrpt   10   636.047 ±  5.564  ops/us
IndentityLookup.hashMapGetFirst                                    true       1  thrpt   10   108.808 ±  1.781  ops/us
IndentityLookup.hashMapGetLast                                     true       1  thrpt   10   109.350 ±  0.794  ops/us
IndentityLookup.hashMapGetMissing                                  true       1  thrpt   10   110.651 ±  2.243  ops/us
IndentityLookup.classLoaderHashCode                                true       1  thrpt   10   112.808 ± 12.129  ops/us

The hashMap* variants are ~14X at worst slower throughput (we usually don't have the missing case), and fixing it will make such lookup a no cost.

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

I am not questioning your finding that this is expensive. I'm saying in Quarkus we don't even need to do this expensive thing, so we should just avoid doing it.

@Sanne
Copy link
Contributor

Sanne commented Apr 8, 2024

@FroMage we do actually have multiple classloaders - but yes since they are very simple, that's in fact why I had suggested in the linked discussion that we could do something "simpler"

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

@FroMage we do actually have multiple classloaders

Yeah, but not in the way it's intended for Application Servers where each app has one CL and they have different config for CP because they're all different applications.

@Sanne
Copy link
Contributor

Sanne commented Apr 8, 2024

@FroMage looks at the data.. there's evidence that this is a problem, I've seen these same flames sparkling myself as well :)

Lol aren't we all just divining shamans..

@Sanne
Copy link
Contributor

Sanne commented Apr 8, 2024

I am not questioning your finding that this is expensive. I'm saying in Quarkus we don't even need to do this expensive thing, so we should just avoid doing it.

Avoiding the lookup would be even better of course. But do you have a solution?

@franz1981
Copy link
Author

franz1981 commented Apr 8, 2024

I'm now checking if i can retrieve this information using some JDK event or by modifying the JDK myself...

Right now is not nice I cannot see MonitorExit...so maybe is not capturing what I need:

Screenshot from 2024-04-08 15-02-36

on https://chriswhocodes.com/jfr_jdk17.html

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

Avoiding the lookup would be even better of course. But do you have a solution?

I might, let me reopen this project after several years… Also, not what I had planned for this afternoon.

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

I am swapping back memory of this from tape storage, let me tell you :-/

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

Can you try this PR, @franz1981 ? https://github.com/quarkusio/quarkus/compare/main...FroMage:quarkus:cp-perf-manager?expand=1

Note that I'm wondering what sort of application calls these methods, but whatever.

@Sanne
Copy link
Contributor

Sanne commented Apr 8, 2024

@FroMage hooo that looks so nice.. but you could have given this hint a year ago when we spotted this problem! 🗡️ 💘

@franz1981
Copy link
Author

franz1981 commented Apr 8, 2024

@FroMage these are the best solutions to me :) (removing code win too)
Soooo...I've sent a run in a benchmrk, will take some, and hope will compile and I let you know, but if it does what I think is doing...it's done :)

no it didn't work...https://github.com/FroMage/quarkus/actions/runs/8600808107/job/23566549448

@FroMage
Copy link
Contributor

FroMage commented Apr 8, 2024

Yeah, silly me, forgetting formatting :) Fixed now.

@franz1981
Copy link
Author

It seems that it has run somehow - sadly i am using a CI job for performance testing. If it was on my laptop I would have take care of doing it myself, while on the job is a PITA, because I have to update the scripts just for this run back/forth (and I don't like Jenkins, nor it likes me)

@franz1981
Copy link
Author

@FroMage
Copy link
Contributor

FroMage commented Apr 9, 2024

Tests did not pass. I'm looking into why

@FroMage
Copy link
Contributor

FroMage commented Apr 9, 2024

It appears some code might be using it before it's set up. Which might be revealing of a problem we've always had.

@FroMage
Copy link
Contributor

FroMage commented Apr 9, 2024

Mutiny, and FT, specifically.

@franz1981
Copy link
Author

You mean your change @FroMage ? Because I like it so much that I cannot see anymore a reason to have this pr opened; it is just a doc why we end up in another place, no?

@FroMage
Copy link
Contributor

FroMage commented Apr 9, 2024

Yeah, my Quarkus fix. It doesn't run CI. But I should be able to fix that.

@franz1981
Copy link
Author

franz1981 commented Apr 9, 2024

Let me close this than, so can resume any conversation elsewhere (if needed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants