Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made LazyValue virtual thread friendly #34479

Merged
merged 1 commit into from
Jul 26, 2023

Conversation

imperatorx
Copy link
Contributor

@imperatorx imperatorx commented Jul 3, 2023

When a bean is first created by arc when a caller from a virtual thread accesses it, it pins the carrier thread. This simple change changes the synchronized blocks with a ReentrantLock

@quarkus-bot quarkus-bot bot added the area/arc Issue related to ARC (dependency injection) label Jul 3, 2023
@franz1981
Copy link
Contributor

franz1981 commented Jul 3, 2023

When a bean is first created by arc when a caller from a virtual thread accesses it, it pins the carrier thread. This simple change changes the synchronized blocks with a ReentrantLock

I don't think is a good idea: the overhead of reentrant locks while contended is higher (it actually create real garbage due to adding waiters to the lock's waiter list!) and you don't want, in case the synchronized region is not held for "long", to be descheduled.
When the computation within the synchronized is CPU bound and not a long one, it's better to use synchronized.
Instead, if you are measuring that contention is the problem, we should try a different way to fix it or differentiate between lazys types (which involve I/O/long ops on the init phase or not).

So, in short: are you observing pinning in some load scenario @imperatorx ?
if yes, please share some data or I can help to collect such and we can find a different way to solve it

@imperatorx
Copy link
Contributor Author

imperatorx commented Jul 3, 2023

So, in short: are you observing pinning in some load scenario @imperatorx ? if yes, please share some data or I can help to collect such and we can find a different way to solve it

Yes, some of my lazy beans are accessing remote services or databases in their @PostConstruct hooks, producer methods or constructors. I see that LazyValue is used for more than just bean holders, but given it were used just for beans, surely the contended locking scenario would be rare?

@franz1981
Copy link
Contributor

franz1981 commented Jul 4, 2023

@mkouba this is common usage scenario for users?
If yes I am fine to change what we have or we could provide different lazy primitives if the init is supposed to be I/O intensive, wdyt?

@mkouba
Copy link
Contributor

mkouba commented Jul 4, 2023

@mkouba this is common usage scenario for users? If yes I am fine to change what we have or we could provide different lazy primitives if the init is supposed to be I/O intensive, wdyt?

I don't think it's a good idea to perform some long-running or I/O intensive operation in the @PostConstruct callback of a bean but it's not forbidden soo 🤷.

@mkouba
Copy link
Contributor

mkouba commented Jul 4, 2023

FTR we have the same class (although not public) in Qute. It's only used to lazily init a map of template fragments (no I/O intensive operation).

@mkouba
Copy link
Contributor

mkouba commented Jul 4, 2023

@franz1981 would the ReentrantReadWriteLock perform better? I.e. if we only acquire the write lock if necessary...

@franz1981
Copy link
Contributor

franz1981 commented Jul 4, 2023

Nope @mkouba , read write is even worse because it track read lock acquisitions, which requires again atomics, while now we just perform a volatile load, on steady state.
I am more concerned here to understand why @imperatorx is seeing I/O on init, given that you said that's not a good practice..there is something users can do to avoid it?

@mkouba
Copy link
Contributor

mkouba commented Jul 4, 2023

Nope @mkouba , read write is even worse because it track read lock acquisitions, which requires again atomics, while now we just perform a volatile load, on steady state.

I see.

I am more concerned here to understand why @imperatorx is seeing I/O on init, given that you said that's not a good practice..there is something users can do to avoid it?

It's not a good practice because you can easily block various things. Unfortunately, there is no async variant of @PostConstruct. In fact, there is no async API for bean construction at all.

In theory, a user can leverage CDI async events (or any other async facility) to trigger some I/O intensive init, e.g. something like:

@ApplicationScoped
public class MyBean {

   volatile Object data;

   @Inject 
   Event<MyBean> event;

   @PostConstruct   
   void init() {
      event.fireAsync(this);
   }

}

public class SomeOtherBean {

    void initMyBean(@ObservesAsync MyBean myBean) {
       asyncClient.doSomething().thenAccept(data -> myBean.data = data);
    }
}

But it's not very convenient and you need to make sure the bean is in the correct state before it's actually used...

@Ladicek @manovotn Can you think of other approaches?

@imperatorx
Copy link
Contributor Author

From the javadoc of @PostConstruct: needs to be executed after dependency injection is done to perform any initialization. It does not make any restrictions what can happen in those methods. The IO in question does not have to be some huge DB query, a bunch of simple 1-5ms single query initializations from different beans are enough to pin the carrier threads and cause the application to deadlock.

I'm not sure about the performance impact of reentrantlock vs synchronized in mostly uncontested scenarios, as for example JEP 353 replaced the synchronzied blocks with locks in the JDK socket API, which is on the hot path of most applications.

@Ladicek
Copy link
Contributor

Ladicek commented Jul 4, 2023

Well, we could replace locking with a simple CAS-based state machine, something like:

public class LazyValue<T> {
  private static final VarHandle VALUE = MethodHandles.lookup().findVarHandle(LazyValue.class, "value", Object.class);

  private static final Object INITIALIZATION_IN_PROGRESS = new Object();

  private final Supplier<T> supplier;

  // this may be:
  // - `null` in case the value has not been computed yet (or `clear()` was called)
  // - `INITIALIZATION_IN_PROGRESS` in case the value is currently being computed
  // - otherwise: the value has been computed
  private transient volatile T value;

  public LazyValue(Supplier<T> supplier) {
    this.supplier = supplier;
  }

  public T get() {
    T value = this.value;
    if (value != null && value != INITIALIZATION_IN_PROGRESS) {
      return value;
    }
    if (VALUE.compareAndSet(this, null, INITIALIZATION_IN_PROGRESS)) {
      value = supplier.get();
      this.value = value;
      return value;
    } else {
      // TODO block until `value` is computed
    }
  }

  public T getIfPresent() {
    T value = this.value;
    return value == INITIALIZATION_IN_PROGRESS ? null : value;
  }

  public void clear() {
    value = null;
  }

  public boolean isSet() {
    T value = this.value;
    return value != null && value != INITIALIZATION_IN_PROGRESS;
  }
}

The TODO is left as an exercise for the reader.

Actually no, I have no idea how I would implement that TODO without implementing a lock :-) Perhaps this is a terrible idea.

@franz1981
Copy link
Contributor

@Ladicek it is fine, just use a completable future which can be used to await the computation (and will be friendly with loom).

@imperatorx

to pin the carrier threads and cause the application to deadlock.

I hope it won't deadlock, just "wait" without being (virtually) context switch

as for example JEP 353 replaced the synchronzied blocks with locks in the JDK socket API, which is on the hot path of most applications.

They have to support loom there, but there's currently some ongoing work on JDK to make synchronized more loom friendly as well.

As said, I think that we need to capture better the users requirements for these specific cases

@Ladicek
Copy link
Contributor

Ladicek commented Jul 4, 2023

I don't immediately see how I would use a CompletableFuture, but that would also have a serious overhead, would it not?

(Also, a trivially correct but pretty much horrible implementation of the TODO is of course a spin lock:

      while (!isSet()) {
        Thread.onSpinWait();
      }
      return this.value;

I have no idea how Loom-friendly that is 😆)

@franz1981
Copy link
Contributor

franz1981 commented Jul 4, 2023

@Ladicek you can use a compareAndSet to a completable future as a promise of the future computed value and you'll than use it to both wait and awake after the computation has completed (who win the compare and set, win the race to populate it) and yes...it's terribly costy (at this point ReentrantLock is fine and better i think!)

@Ladicek
Copy link
Contributor

Ladicek commented Jul 4, 2023

Ah yeah, that would work, but as you mention, has a pretty high cost. I guess we could implement a tailored lock by subclassing AbstractQueuedSynchronizer, but I'm not sure we'd beat ReentrantLock really 🤷

This PR seems good enough to me.

(Of course, I'd much prefer being able to give the JVM a hint that synchronization is not an externally visible contract here and it would do whatever it wants...)

@mkouba
Copy link
Contributor

mkouba commented Jul 7, 2023

Ok, so if I understand the discussion above correctly we do want to use the ReentrantLock version for now, or?

@franz1981 @Ladicek

@quarkus-bot

This comment has been minimized.

@quarkus-bot

This comment has been minimized.

@Ladicek
Copy link
Contributor

Ladicek commented Jul 10, 2023

Yeah, I think we should accept this (if we care enough, that is). It is a little sad that the size of the LazyValue object increases, as well as the number of allocations, but the overall effect should be negligible IMHO. @franz1981 WDYT?

@mkouba
Copy link
Contributor

mkouba commented Jul 10, 2023

Yeah, I think we should accept this (if we care enough, that is). It is a little sad that the size of the LazyValue object increases, as well as the number of allocations, but the overall effect should be negligible IMHO. @franz1981 WDYT?

It would be interesting to see the numbers from some our benchmark before we merge this PR...

@@ -34,9 +42,9 @@ public T getIfPresent() {
}

public void clear() {
synchronized (this) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check outside of this critical zone if it has already happened and save entering into it.
@mkouba do we expect to be able to init it again, after a clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that can happen - first thing coming to my mind is dynamic resolution and caching of its result which can then be cleared and subsequently cached again - the cache is a LazyValue.
See https://github.com/quarkusio/quarkus/blob/main/independent-projects/arc/runtime/src/main/java/io/quarkus/arc/impl/InstanceImpl.java#L304

@franz1981
Copy link
Contributor

In term of memory footprint...
what we got now is with COOPS and CCPS (8 bytes aligned)

io.quarkus.qute.LazyValue object internals:
OFF  SZ          TYPE DESCRIPTION               VALUE
  0   8               (object header: mark)     N/A
  8   4               (object header: class)    N/A
 12   4   Supplier<T> LazyValue.supplier        N/A
 16   4             T LazyValue.value           N/A
 20   4               (object alignment gap)    
Instance size: 24 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

while now is:

io.quarkus.qute.LazyValue object internals:
OFF  SZ            TYPE DESCRIPTION               VALUE
  0   8                 (object header: mark)     N/A
  8   4                 (object header: class)    N/A
 12   4     Supplier<T> LazyValue.supplier        N/A
 16   4   ReentrantLock LazyValue.lock            N/A
 20   4               T LazyValue.value           N/A
Instance size: 24 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

that means that just the additional ReentrantLock instance is adding some footprint cost, which is
16 ReentrantLock

java.util.concurrent.locks.ReentrantLock object internals:
OFF  SZ   TYPE DESCRIPTION               VALUE
  0   8        (object header: mark)     N/A
  8   4        (object header: class)    N/A
 12   4   Sync ReentrantLock.sync        N/A
Instance size: 16 bytes
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total

28 NonfairSync which is shown below:

java.util.concurrent.locks.ReentrantLock.NonfairSync object internals:
OFF  SZ     TYPE DESCRIPTION                                        VALUE
 0   8          (object header: mark)                              N/A
 8   4          (object header: class)                             N/A
12   4   Thread AbstractOwnableSynchronizer.exclusiveOwnerThread   N/A
16   4      int AbstractQueuedSynchronizer.state                   N/A
20   4     Node AbstractQueuedSynchronizer.head                    N/A
24   4     Node AbstractQueuedSynchronizer.tail                    N/A
28   4          (object alignment gap)                             
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

that's than +44 bytes more. In short, depending how many instances we got, it could be relevant (or a drop in the ocean too?).
We can save some by making LazyValue to extends ReentrantLock and be ready for the "God of developers" judgment when we all retire, too

@mkouba
Copy link
Contributor

mkouba commented Jul 12, 2023

that's than +44 bytes more. In short, depending how many instances we got, it could be relevant (or a drop in the ocean too?).

I think that it's acceptable.

We can save some by making LazyValue to extends ReentrantLock and be ready for the "God of developers" judgment when we all retire, too

Or rather not ;-)

@mkouba mkouba force-pushed the virtual-thread-friendly-lazyvalue branch from 2234cd3 to ff28650 Compare July 12, 2023 12:41
@mkouba mkouba added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jul 12, 2023
@quarkus-bot
Copy link

quarkus-bot bot commented Jul 12, 2023

Failing Jobs - Building ff28650

Status Name Step Failures Logs Raw logs
✔️ Maven Tests - JDK 11
Maven Tests - JDK 11 Windows Build ⚠️ Check → Logs Raw logs

@geoand
Copy link
Contributor

geoand commented Jul 19, 2023

What's the status of this?

@manovotn
Copy link
Contributor

What's the status of this?

I think it's mergeable unless @franz1981 has anything else we should address here.

@geoand geoand merged commit 0e89470 into quarkusio:main Jul 26, 2023
@quarkus-bot quarkus-bot bot removed the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jul 26, 2023
@quarkus-bot quarkus-bot bot added this to the 3.3 - main milestone Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/arc Issue related to ARC (dependency injection)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants