POC =act Make use of externalSubmit to yield. #666

He-Pin · 2023-09-20T19:00:51Z

refs: #661

Some previous work done by @jrudolph years ago?

The Current VirtualThrad in JDK 21 making use of the externalSubmit

if (s == YIELDING) {   // Thread.yield
            setState(RUNNABLE);

            // notify JVMTI that unmount has completed, thread is runnable
            notifyJvmtiUnmount(/*hide*/false);

            // external submit if there are no tasks in the local task queue
            if (currentThread() instanceof CarrierThread ct && ct.getQueuedTaskCount() == 0) {
                externalSubmitRunContinuation(ct.getPool());
            } else {
                submitRunContinuation();
            }
        }

I think this change will affect the performance.

I think a better way should be explicit yield after some run, just as the cede of CE, which @armanbilge metioned.

He-Pin · 2023-09-20T19:05:49Z

actor-tests/src/test/scala/org/apache/pekko/dispatch/ForkJoinPoolStarvationSpec.scala

-      if (JavaVersion.majorVersion >= 17)
-        pending
+//      if (JavaVersion.majorVersion >= 17)
+//        pending


I tested it locally.

jrudolph · 2023-09-21T07:30:09Z

Thanks for reviving this experiment.

In any case, this would need very close auditing of performance consequences (which may be different under various JDKs). It was quite a contentious issue originally whether this should be fixed at all.

Without looking into the current situation, I'd say, let's not fix anything here without very good reason that something is broken in realistic cases. It seems likely that a fix like this will negatively affect throughput for everyone while improving fairness only in very specific cases. If we want to play with this, we would have to introduce a config setting that is off by default so people could try it in various circumstances.

He-Pin · 2023-09-21T07:43:49Z

@jrudolph Yes, I think the current VirtualThread has the same problem, which only yield to the submissionQueue when there is no works to do, and VirutalThread just do lazySubmit for most the time.

It very hard to make things right, the code changes from JDK release to release, and the current impl will hurt performance.

refs: typelevel/cats-effect#3070
refs: https://github.com/typelevel/cats-effect/releases

I asked @armanbilge , he said the WSTP can be used outside of CE too.

pjfanning · 2023-09-21T07:59:05Z

Anyone know if there is anything in the sbt ecosystem to support Multi-Release jars. That would allow us to have tailored versions of some classes that are used when you use a particular JDK/JRE version at runtime. This would be more efficient than having one version of the code and checking the JDK/JRE version using if stataments.

He-Pin · 2023-09-21T08:02:44Z

@pjfanning I asked @Glavo once for this and she said the toolchian is not very well, the reactor-core is using multi-release jar.

mdedetrich · 2023-09-21T08:03:29Z

Anyone know if there is anything in the sbt ecosystem to support Multi-Release jars. That would allow us to have tailored versions of some classes that are used when you use a particular JDK/JRE version at runtime. This would be more efficient than having one version of the code and checking the JDK/JRE version using if stataments.

Its a request feature that is yet to be implemented, see sbt/sbt#7174 (comment). I was thinking of working on this at some point but there are already so many OS things that I am working on flying around atm however I can re-prioritize to look into it (it does seem we will need this sooner or later). Having multi-release jar's will definitely make a lot of this stuff much cleaner.

I don't think it should be too hard, I think SBT already has functionality to detect multiple JDK's, what needs to be fleshed out is how the matrix will work, i.e. will we have scala-java8 folders for sources only compiled for JDK 8 and will stuff like scala-2-java8/scala-2.13+-java11 work?

pjfanning · 2023-09-21T08:08:22Z

I don't think it should be too hard, I think SBT already has functionality to detect multiple JDK's, what needs to be fleshed out is how the matrix will work, i.e. will we have scala-java8 folders for sources only compiled for JDK 8 and will stuff like scala-2-java8/scala-2.13+-java11 work?

Sounds reasonable but we might also need to support src/main/java11 too (java code that should be used for Java 11+).

mdedetrich · 2023-09-21T08:09:15Z

Sounds reasonable but we might also need to support src/main/java11 too (java code that should be used for Java 11+).

Yes thats implied

jrudolph · 2023-09-21T08:29:17Z

One thing to keep in mind is that Pekko/Akka is already somewhat biased against fairness by choosing relatively big batching values both for the actor "throughput" setting and addition to that for the stream "sync processing limit". Therefore, a lot of the scheduling overhead that other effect systems might see because of pushing every single thunk through the general scheduler (fine-grained parallelism) might not apply to Pekko, especially when using streams. In general, this can lead to higher tail latencies in high load scenarios, though it depends a lot on actual setup of streams etc.

In general, providing fairness to get more predictable latency is a worthwhile goal. My impression is that low-level optimizations will likely not cut it. A high-level reason is that under overload scenarios (the only ones where fairness is relevant), some work is more important than other (work which drives the control flow vs actual CPU intensive workloads). However you structure your work queues as long as you are not able to prioritize more important work against greedy CPU intensive workload tasks (which the one in this example is with a tight loop of actors sending messages back and forth), you will not be able to solve the issues of calculations getting stuck because control flow is starved of CPU resources. (See also, how tight loops can prevent GCs from happening). One approach is to bulk-head CPU heavy but low-priority work out of the general purpose queues that drive control flow (e.g. introduction of internal dispatcher, running CPU workloads out of the streams, creating dispatchers for blocking work, creating extra OS threads to push back fairness issues to the OS scheduler). Unfortunately, these kind of solutions require manual interventions and understanding of the systems.

jrudolph · 2023-09-21T08:33:19Z

And to be clear, props to cats-effect (and also Tokio) to push the state of the art on this. Let's just not jump to conclusions regarding Pekko about this.

mdedetrich · 2023-09-22T08:21:29Z

In general, providing fairness to get more predictable latency is a worthwhile goal. My impression is that low-level optimizations will likely not cut it. A high-level reason is that under overload scenarios (the only ones where fairness is relevant), some work is more important than other (work which drives the control flow vs actual CPU intensive workloads). However you structure your work queues as long as you are not able to prioritize more important work against greedy CPU intensive workload tasks (which the one in this example is with a tight loop of actors sending messages back and forth), you will not be able to solve the issues of calculations getting stuck because control flow is starved of CPU resources. (See also, how tight loops can prevent GCs from happening). One approach is to bulk-head CPU heavy but low-priority work out of the general purpose queues that drive control flow (e.g. introduction of internal dispatcher, running CPU workloads out of the streams, creating dispatchers for blocking work, creating extra OS threads to push back fairness issues to the OS scheduler). Unfortunately, these kind of solutions require manual interventions and understanding of the systems.

If I understand correctly is this basically asking for a higher level API approach where we can assigned certain priorities to flows within a pekko stream (I guess via Attributes) so that in cases of CPU starvation the pekko actor/stream runtime has knowledge of what is more critical to continue and what isn't?

jrudolph · 2023-09-22T08:26:18Z

No, I just want to prevent that we do microoptimizations based on findings of different projects with potentially different premises without having real world problems to solve. ForkJoinPoolStarvationSpec is imo not enough to warrant doing any changes here.

mdedetrich · 2023-09-22T08:32:15Z

No, I just want to prevent that we do microoptimizations based on findings of different projects with potentially different premises without having real world problems to solve. ForkJoinPoolStarvationSpec is imo not enough to warrant doing any changes here.

No disagreements here, I think I got the incorrect impression (I thought you were speaking generally rather than this PR specifically).

Maybe we can make another general discussion thread if its relevant, but I do agree we need to be careful about these cases.

=act Make use of externalSubmit to yield.

8cad2b1

He-Pin commented Sep 20, 2023

View reviewed changes

He-Pin mentioned this pull request Jan 17, 2024

support jdk9 forkjoinpool maximum-pool-size #485

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC =act Make use of externalSubmit to yield. #666

POC =act Make use of externalSubmit to yield. #666

He-Pin commented Sep 20, 2023 •

edited

Loading

He-Pin Sep 20, 2023

jrudolph commented Sep 21, 2023

He-Pin commented Sep 21, 2023

pjfanning commented Sep 21, 2023

He-Pin commented Sep 21, 2023 •

edited

Loading

mdedetrich commented Sep 21, 2023 •

edited

Loading

pjfanning commented Sep 21, 2023

mdedetrich commented Sep 21, 2023

jrudolph commented Sep 21, 2023

jrudolph commented Sep 21, 2023

mdedetrich commented Sep 22, 2023

jrudolph commented Sep 22, 2023

mdedetrich commented Sep 22, 2023 •

edited

Loading

POC =act Make use of externalSubmit to yield. #666

Are you sure you want to change the base?

POC =act Make use of externalSubmit to yield. #666

Conversation

He-Pin commented Sep 20, 2023 • edited Loading

He-Pin Sep 20, 2023

Choose a reason for hiding this comment

jrudolph commented Sep 21, 2023

He-Pin commented Sep 21, 2023

pjfanning commented Sep 21, 2023

He-Pin commented Sep 21, 2023 • edited Loading

mdedetrich commented Sep 21, 2023 • edited Loading

pjfanning commented Sep 21, 2023

mdedetrich commented Sep 21, 2023

jrudolph commented Sep 21, 2023

jrudolph commented Sep 21, 2023

mdedetrich commented Sep 22, 2023

jrudolph commented Sep 22, 2023

mdedetrich commented Sep 22, 2023 • edited Loading

He-Pin commented Sep 20, 2023 •

edited

Loading

He-Pin commented Sep 21, 2023 •

edited

Loading

mdedetrich commented Sep 21, 2023 •

edited

Loading

mdedetrich commented Sep 22, 2023 •

edited

Loading