Use Gradle Worker API #2903

vmishenev · 2023-03-07T18:30:29Z

WIP

Motivation

Currently, Dokka reloads a big classpath for every task. Also, it has reflection magic to support different versions of plugins.
Dokka works parallel badly. For example, the run time on on a project with 100 tasks is 8 minutes.

Proposal

Gradle Worker API can help Dokka to avoid it. But we have 2 options for using it:

Use cached classpath and noIsolation mode (to keep a classpath in a static variable that is shared between tasks). This approach is used in Kapt.
Use processIsolation mode. Dokka task will be executed in worker daemon processes. The running processes with the same classpath can be reused for other tasks. In this case, the classpath is loaded once per process.

We need to choose only one option.
We already have prototypes. @aSemy created one for the second option and there is a prototype for the cached classpath here.

Pros&cons

From a performance point of view, processIsolation requires time to run worker processes (default is number of CPU processors) and load classpath in each process. However, if daemons are already running, the time of Dokka executing is the same as for cached classpath.
My observations:
Coroutines (precompilied): ~35s (cached classpath) vs ~66s (processIsolation with 8 workers)
Small project with 100 tasks : ~20s (cached classpath) vs ~60s (processIsolation with 8 workers)
My hypothesis: the difference in execution time between the two approaches is a constant (for my computer with 8 workers the difference is ~40 sec) to run processes and load classpathes. But the constant depends on the number of workers (e.g. coroutines --max-workers=2 takes 44 sec).
Also, we can adjust the number of workers (depending on a number of tasks) to get more performance.
Stability. noIsolation needs synchronization for a static state. For example, we have a data race with static properties on the IDE part. Also, Kapt experienced OOM issues. Since we use external libraries (and the compiler) and Dokka is not designed for a multithreading environment, some such kinds of problems can appear.
Library compatibility. The Gradle documentation says: "External libraries may rely on certain system properties to be set which may conflict between work items. Or a library might not be compatible with the version of JDK that Gradle is running with and may need to be run with a different version. ". But I am not sure that is relevant for our external libraries.

There are other points for choosing an approach. Everybody can share their opinion about it here.

To sum up, I personally vote for the first option. In the case of Dokka, we will increase little bit of time building but stability is more important than performance.

The text was updated successfully, but these errors were encountered:

aSemy · 2023-03-08T09:24:24Z

There's another isolation mode too, classLoaderIsolation(). It's a halfway point between the other two, and I think it's worth considering. I strongly suspect that it would be the most performant of the 3 options.

One benefit to processIsolation is that it prevents problems with coroutines, and so hopefully it would make finalizeCoroutines option obsolete.

dokka/core/src/main/kotlin/configuration.kt

Lines 139 to 156 in 14c05d7

    
               /** 
        
                * Whether coroutines dispatchers should be shutdown after 
        
                * generating documentation via [DokkaGenerator.generate]. 
        
                * 
        
                * It effectively stops all background threads associated with 
        
                * coroutines in order to make classes unloadable by the JVM, 
        
                * and rejects all new tasks with [RejectedExecutionException] 
        
                * 
        
                * This is primarily useful for multi-module builds where coroutines 
        
                * can be shut down after each module's partial task to avoid 
        
                * possible memory leaks. 
        
                * 
        
                * However, this can lead to problems in specific lifecycles where 
        
                * coroutines are shared and will be reused after documentation generation, 
        
                * and closing it down will leave the build in an inoperable state. 
        
                * One such example is unit tests, for which finalization should be disabled. 
        
                */ 
        
               val finalizeCoroutines: Boolean

I think that whatever option is chosen, I think #2740 will be broadly the same.

TWiStErRob · 2023-07-25T08:35:19Z

@aSemy What is the reason for Dokka tasks executing one by one, one after the other in a Gradle parallel build? When I do a publish I can see compilation and all other tasks complete very quickly, then all workers (28+) are busy with different Dokka tasks from submodules. The console logs are suggesting that one needs to complete before the next one can start.

aSemy · 2023-07-25T09:53:21Z

@aSemy What is the reason for Dokka tasks executing one by one, one after the other in a Gradle parallel build? When I do a publish I can see compilation and all other tasks complete very quickly, then all workers (28+) are busy with different Dokka tasks from submodules. The console logs are suggesting that one needs to complete before the next one can start.

DGP (Dokka Gradle Plugin) is not compatible with many Gradle features like project isolation, build cache, configuration cache - see #2700 - so tasks run sequentially, even across different subprojects. And DGP does not use the Worker API at the moment (hence this issue and my PR #2740).

If you want to generate Dokka docs faster then look at Dokkatoo. Dokkatoo is a re-implemented DGP that supports all the speedy Gradle features. Dokkatoo is not a drop-in replacement for DGP, but it's pretty similar, and you run add both DGP and Dokkatoo in the same project to verify that the output of both is identical.

(Funny that you've pinged me (just a contributor) rather than the actual maintainers @IgnatBeresnev and @vmishenev 😄)

TWiStErRob · 2023-07-25T10:15:56Z

I pinged you exactly because of all those issues, PRs and repo you linked. I know DGP is not compatible with fancy new features, but org.gradle.parallel is a pretty pretty old feature. I'm wondering what is locking inside Dokka that doesn't allow basic parallelism.

aSemy · 2023-07-25T10:42:04Z

I pinged you exactly because of all those issues, PRs and repo you linked. I know DGP is not compatible with fancy new features, but org.gradle.parallel is a pretty pretty old feature. I'm wondering what is locking inside Dokka that doesn't allow basic parallelism.

Ah okay. Hmm, I'm not sure I can give a definitive answer because I'm not exactly sure how --parallel works and what the requirements are, and precisely what DGP is doing that would prevent it, but these items in particular are related:

DGP multi-module tasks depend on DGP tasks from other subprojects. This means that subprojects are not isolated, and Dokka tasks from one subproject might have dependencies on tasks in other subprojects.
- dokka/runners/gradle-plugin/src/main/kotlin/org/jetbrains/dokka/gradle/DokkaPlugin.kt
  
  Line 100 in 900fbcc
  
  addSubprojectChildTasks(name)
- dokka/runners/gradle-plugin/src/main/kotlin/org/jetbrains/dokka/gradle/DokkaPlugin.kt
  
  Line 83 in 900fbcc
  
  addSubprojectChildTasks("${name}Partial")
Dokkatoo fixes this by using Configurations to share files.
DGP fetches the classpath directly from the Kotlin compilation task.

dokka/runners/gradle-plugin/src/main/kotlin/org/jetbrains/dokka/gradle/kotlin/kotlinClasspathUtils.kt

Line 63 in 900fbcc

kotlinCompile.libraries // introduced in 1.7.0

While this won't necessarily cause problems, it does cause task-linking that shouldn't exist. When coupled with cross-project tasks, this will probably interfere with Gradle workings and add unnecessary coupling between tasks.

Dokkatoo fixes this by fetching the dependencies via the resolvable Configurations.

TWiStErRob · 2023-07-25T10:51:15Z

Thanks for the pointers!

martinbonnin · 2024-01-08T09:41:47Z

+1 for classloaderIsolation or at least making the isolation configurable. In addition to performance, classloaderIsolation makes it way easier to debug the builds.

adam-enko · 2024-04-09T08:01:09Z

One potential benefit of process isolation is that it would allow for class data sharing. With a multimodule project, Dokka Generator has to run multiple times with the same classpath. The Dokka classpath can be quite large (the analyzer component is ~80MB). Using CDS would mean that the classes could be 'cached' between generations, improving startup time and reducing memory usage.

whyoleg · 2024-12-16T16:26:12Z

Implemented in Dokka 2.0.0 in Dokka Gradle plugin v2

By default Dokka will be executed with processIsolation and xmx=2GB because of gradle/gradle#18313
Check the Troubleshooting section of migration guide if the default doesn't fit your project.

martinbonnin · 2024-12-16T16:35:05Z

FWIW, my view on this has changed. I now try avoid both classloader and process isolation altogether because of the memory leaks and/or multiple processes/heap to manage.

Instead, I'm now managing a classloader cache in a buildservice. This is a bit more manual work but at least it gives more control over the classloader lifecycle. A build service can cache a classloader for a given build but could even decide more advanced strategies like "keep alive for 30min" or "as long as the Gradle daemon is alive", ...

The only advantage I now see in using the worker API is to give parallel task execution to users without configuration cache enabled.

adam-enko · 2024-12-17T13:07:06Z

Instead, I'm now managing a classloader cache in a buildservice. This is a bit more manual work but at least it gives more control over the classloader lifecycle. A build service can cache a classloader for a given build but could even decide more advanced strategies like "keep alive for 30min" or "as long as the Gradle daemon is alive", ...

This sounds like a good approach, and something Dokka could use, but be careful you don't stub your toe on gradle/gradle#17559 :)

vmishenev added enhancement An issue for a feature or an overall improvement feedback: Kotlin libs Feedback from Kotlin's internal libraries labels Mar 7, 2023

IgnatBeresnev added the runner: Gradle plugin An issue/PR related to Dokka's Gradle plugin label Mar 8, 2023

aSemy mentioned this issue Mar 8, 2023

Improve compatibility with the Gradle API, and follow best practices #2700

Closed

qwwdfsad mentioned this issue Mar 8, 2023

Dokka spends 35%+ of its time in GC on kotlinx.coroutines #2729

Closed

qwwdfsad mentioned this issue Mar 21, 2023

Investigate possibility to avoid kotlinx.coroutines dependency #2936

Open

IgnatBeresnev added this to the Gradle runner 2.0 milestone Aug 17, 2023

adam-enko added the runner: gradle plugin v2 Issues fixed by Dokka Gradle Plugin v2 - see https://github.com/Kotlin/dokka/issues/3131 label Aug 28, 2024

adam-enko modified the milestones: Gradle runner 2.0, Dokka 2.0.0 Aug 28, 2024

whyoleg closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Gradle Worker API #2903

Use Gradle Worker API #2903

vmishenev commented Mar 7, 2023

aSemy commented Mar 8, 2023 •

edited

Loading

TWiStErRob commented Jul 25, 2023

aSemy commented Jul 25, 2023

TWiStErRob commented Jul 25, 2023

aSemy commented Jul 25, 2023 •

edited

Loading

TWiStErRob commented Jul 25, 2023

martinbonnin commented Jan 8, 2024

adam-enko commented Apr 9, 2024

whyoleg commented Dec 16, 2024

martinbonnin commented Dec 16, 2024

adam-enko commented Dec 17, 2024

Use Gradle Worker API #2903

Use Gradle Worker API #2903

Comments

vmishenev commented Mar 7, 2023

Motivation

Proposal

Pros&cons

aSemy commented Mar 8, 2023 • edited Loading

TWiStErRob commented Jul 25, 2023

aSemy commented Jul 25, 2023

TWiStErRob commented Jul 25, 2023

aSemy commented Jul 25, 2023 • edited Loading

TWiStErRob commented Jul 25, 2023

martinbonnin commented Jan 8, 2024

adam-enko commented Apr 9, 2024

whyoleg commented Dec 16, 2024

martinbonnin commented Dec 16, 2024

adam-enko commented Dec 17, 2024

aSemy commented Mar 8, 2023 •

edited

Loading

aSemy commented Jul 25, 2023 •

edited

Loading