From 4bf1f86d1af9add915c9e97f18c76dd380cf6bfb Mon Sep 17 00:00:00 2001 From: Tbkhi <157125900+Tbkhi@users.noreply.github.com> Date: Mon, 4 Mar 2024 14:25:31 -0400 Subject: [PATCH 1/3] Update parallel-rustc.md Minor updates to syntax and improvements for readability. --- src/parallel-rustc.md | 56 +++++++++++++++++++++---------------------- 1 file changed, 27 insertions(+), 29 deletions(-) diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md index 9942f751a..5a4472755 100644 --- a/src/parallel-rustc.md +++ b/src/parallel-rustc.md @@ -1,12 +1,11 @@ # Parallel Compilation -As of August 2022, the only stage of the compiler that -is already parallel is codegen. Some parts of the compiler already have -parallel implementations, such as query evaluation, type check and -monomorphization, but the general version of the compiler does not include -these parallelization functions. **To try out the current parallel compiler**, -one can install rustc from source code with `parallel-compiler = true` in -the `config.toml`. +As of August 2022, the only stage of the compiler that is +parallel is codegen. Some parts of the compiler have parallel implementations, +such as query evaluation, type check and [monomorphization][monomorphization], +but the general version of the compiler does not include parallelization +functions. **To try out the current parallel compiler**, install `rustc` from +source code with `parallel-compiler = true` in the `Config.toml`. The lack of parallelism at other stages (for example, macro expansion) also represents an opportunity for improving compiler performance. @@ -14,9 +13,9 @@ represents an opportunity for improving compiler performance. These next few sections describe where and how parallelism is currently used, and the current status of making parallel compilation the default in `rustc`. -## Codegen +## Code Generation -During [monomorphization][monomorphization] the compiler splits up all the code to +During monomorphization the compiler splits up all the code to be generated into smaller chunks called _codegen units_. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process @@ -45,22 +44,22 @@ are implemented differently depending on whether `parallel-compiler` is true. | LockGuard | parking_lot::MutexGuard | std::cell::RefMut | | MappedLockGuard | parking_lot::MappedMutexGuard | std::cell::RefMut | -- These thread-safe data structures interspersed during compilation can - cause a lot of lock contention, which actually degrades performance as the - number of threads increases beyond 4. This inspires us to audit the use - of these data structures, leading to either refactoring to reduce use of - shared state, or persistent documentation covering invariants, atomicity, - and lock orderings. +- These thread-safe data structures are interspersed during compilation which + can cause lock contention resulting in degraded performance as the number of + threads increases beyond 4. So we audit the use of these data structures + which leads to either a refactoring so as to reduce the use of shared state, + or the authoring of persistent documentation covering the specific of the + invariants, the atomicity, and the lock orderings. - On the other hand, we still need to figure out what other invariants during compilation might not hold in parallel compilation. ### WorkLocal -`WorkLocal` is a special data structure implemented for parallel compiler. -It holds worker-locals values for each thread in a thread pool. You can only -access the worker local value through the Deref impl on the thread pool it -was constructed on. It will panic otherwise. +`WorkLocal` is a special data structure implemented for parallel compilers. It +holds worker-locals values for each thread in a thread pool. You can only +access the worker local value through the `Deref` `impl` on the thread pool it +was constructed on. It panics otherwise. `WorkLocal` is used to implement the `Arena` allocator in the parallel environment, which is critical in parallel queries. Its implementation @@ -115,7 +114,7 @@ There are still many loops that have the potential to use parallel iterators. The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort: -- All data a query provider can access is accessed via the query context, so +- All data a query provider can access is via the query context, so the query context can take care of synchronizing access. - Query results are required to be immutable so they can safely be used by different threads concurrently. @@ -141,25 +140,24 @@ the previous `Data Structures` and `Parallel Iterators`. See [this tracking issu ## Rustdoc As of November 2022, there are still a number of steps -to complete before rustdoc rendering can be made parallel. More details on -this issue can be found [here][parallel-rustdoc]. +to complete before `rustdoc` rendering can be made parallel (see a discussion of +[parallel `rustdoc`][parallel-rustdoc]). ## Resources -Here are some resources that can be used to learn more (note that some of them -are a bit out of date): +Here are some resources that can be used to learn more: +- [This IRLO thread by alexchricton about performance][irlo1] - [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0] - [This list of interior mutability in the compiler by nikomatsakis][imlist] -- [This IRLO thread by alexchricton about performance][irlo1] [`rayon`]: https://crates.io/crates/rayon -[rustc-rayon]: https://github.com/rust-lang/rustc-rayon -[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606 +[Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html [imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md +[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606 [irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503 -[tracking]: https://github.com/rust-lang/rust/issues/48685 [monomorphization]: backend/monomorph.md [parallel-rustdoc]: https://github.com/rust-lang/rust/issues/82741 -[Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html [Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html +[rustc-rayon]: https://github.com/rust-lang/rustc-rayon +[tracking]: https://github.com/rust-lang/rust/issues/48685 From a2f1ab248dd9afd9ab4a7232a742f4b64ceac55e Mon Sep 17 00:00:00 2001 From: Tbkhi Date: Tue, 5 Mar 2024 11:53:05 -0400 Subject: [PATCH 2/3] doc updates --- src/parallel-rustc.md | 62 +++++++++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 26 deletions(-) diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md index 5a4472755..a15f33123 100644 --- a/src/parallel-rustc.md +++ b/src/parallel-rustc.md @@ -1,11 +1,12 @@ # Parallel Compilation As of August 2022, the only stage of the compiler that is -parallel is codegen. Some parts of the compiler have parallel implementations, -such as query evaluation, type check and [monomorphization][monomorphization], -but the general version of the compiler does not include parallelization -functions. **To try out the current parallel compiler**, install `rustc` from -source code with `parallel-compiler = true` in the `Config.toml`. +parallel is [code generation stage][codegen] (codegen). Some parts of the +compiler have parallel implementations, such as query evaluation, type check +and [monomorphization][monomorphization], but the general version of the +compiler does not include parallelization functions. **To try out the current +parallel compiler**, install `rustc` from source code with `parallel-compiler = +true` in the `Config.toml`. The lack of parallelism at other stages (for example, macro expansion) also represents an opportunity for improving compiler performance. @@ -13,18 +14,22 @@ represents an opportunity for improving compiler performance. These next few sections describe where and how parallelism is currently used, and the current status of making parallel compilation the default in `rustc`. +[codegen]: backend/codegen.md + ## Code Generation During monomorphization the compiler splits up all the code to be generated into smaller chunks called _codegen units_. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process -occurs in the `rustc_codegen_ssa::base` module. +occurs in the [`rustc_codegen_ssa::base`] module. + +[`rustc_codegen_ssa::base`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/index.html ## Data Structures The underlying thread-safe data-structures used in the parallel compiler -can be found in the `rustc_data_structures::sync` module. These data structures +can be found in the [`rustc_data_structures::sync`] module. These data structures are implemented differently depending on whether `parallel-compiler` is true. | data structure | parallel | non-parallel | @@ -54,24 +59,29 @@ are implemented differently depending on whether `parallel-compiler` is true. - On the other hand, we still need to figure out what other invariants during compilation might not hold in parallel compilation. -### WorkLocal +[`rustc_data_structures::sync`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/index.html -`WorkLocal` is a special data structure implemented for parallel compilers. It +### WorkerLocal + +[`WorkerLocal`] is a special data structure implemented for parallel compilers. It holds worker-locals values for each thread in a thread pool. You can only access the worker local value through the `Deref` `impl` on the thread pool it was constructed on. It panics otherwise. -`WorkLocal` is used to implement the `Arena` allocator in the parallel -environment, which is critical in parallel queries. Its implementation -is located in the `rustc-rayon-core::worker_local` module. However, in the -non-parallel compiler, it is implemented as `(OneThread)`, whose `T` +`WorkerLocal` is used to implement the `Arena` allocator in the parallel +environment, which is critical in parallel queries. Its implementation is +located in the [`rustc_data_structures::sync::worker_local`] module. However, +in the non-parallel compiler, it is implemented as `(OneThread)`, whose `T` can be accessed directly through `Deref::deref`. +[`rustc_data_structures::sync::worker_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/index.html +[`WorkerLocal`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/struct.WorkerLocal.html + ## Parallel Iterator -The parallel iterators provided by the [`rayon`] crate are easy ways -to implement parallelism. In the current implementation of the parallel -compiler we use a custom [fork][rustc-rayon] of [`rayon`] to run tasks in parallel. +The parallel iterators provided by the [`rayon`] crate are easy ways to +implement parallelism. In the current implementation of the parallel compiler +we use a custom [fork][rustc-rayon] of `rayon` to run tasks in parallel. Some iterator functions are implemented to run loops in parallel when `parallel-compiler` is true. @@ -87,10 +97,9 @@ when `parallel-compiler` is true. | **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module | rustc_middle::hir | | **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module | rustc_middle::hir | -There are a lot of loops in the compiler which can possibly be -parallelized using these functions. As of August -2022, scenarios where the parallel iterator function has been used -are as follows: +There are a lot of loops in the compiler which can possibly be parallelized +using these functions. As of August 2022, scenarios where +the parallel iterator function has been used are as follows: | caller | scenario | callee | | ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ | @@ -112,7 +121,7 @@ There are still many loops that have the potential to use parallel iterators. ## Query System The query model has some properties that make it actually feasible to evaluate -multiple queries in parallel without too much of an effort: +multiple queries in parallel without too much effort: - All data a query provider can access is via the query context, so the query context can take care of synchronizing access. @@ -134,14 +143,15 @@ When a query `foo` is evaluated, the cache table for `foo` is locked. the compiler uses an extra thread *(named deadlock handler)* to detect, remove and report the cycle error. -Parallel query still has a lot of work to do, most of which is related to -the previous `Data Structures` and `Parallel Iterators`. See [this tracking issue][tracking]. +The parallel query feature still has implementation to do, most of which is +related to the previous `Data Structures` and `Parallel Iterators`. See [this +open feature tracking issue][tracking]. ## Rustdoc -As of November 2022, there are still a number of steps -to complete before `rustdoc` rendering can be made parallel (see a discussion of -[parallel `rustdoc`][parallel-rustdoc]). +As of November 2022, there are still a number of steps to +complete before `rustdoc` rendering can be made parallel (see a open discussion +of [parallel `rustdoc`][parallel-rustdoc]). ## Resources From 98199758b8c7e2049de9e1428fe725027ccce7e8 Mon Sep 17 00:00:00 2001 From: Jieyou Xu Date: Fri, 8 Nov 2024 16:56:01 +0800 Subject: [PATCH 3/3] Slightly update parallel front end overview, backlink to tracking issue Co-authored-by: SparrowLii --- src/parallel-rustc.md | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md index a15f33123..be51f21b4 100644 --- a/src/parallel-rustc.md +++ b/src/parallel-rustc.md @@ -1,18 +1,29 @@ # Parallel Compilation -As of August 2022, the only stage of the compiler that is -parallel is [code generation stage][codegen] (codegen). Some parts of the -compiler have parallel implementations, such as query evaluation, type check -and [monomorphization][monomorphization], but the general version of the -compiler does not include parallelization functions. **To try out the current -parallel compiler**, install `rustc` from source code with `parallel-compiler = -true` in the `Config.toml`. - -The lack of parallelism at other stages (for example, macro expansion) also -represents an opportunity for improving compiler performance. - -These next few sections describe where and how parallelism is currently used, -and the current status of making parallel compilation the default in `rustc`. +
+Parallel front-end is currently (as of 2024 November) undergoing significant +changes, this page contains quite a bit of outdated information. + +Tracking issue: +
+ +As of November 2024, most of the rust compiler is now +parallelized. + +- The codegen part is executed concurrently by default. You can use the `-C + codegen-units=n` option to control the number of concurrent tasks. +- The parts after HIR lowering to codegen such as type checking, borrowing + checking, and mir optimization are parallelized in the nightly version. + Currently, they are executed in serial by default, and parallelization is + manually enabled by the user using the `-Z threads = n` option. +- Other parts, such as lexical parsing, HIR lowering, and macro expansion, are + still executed in serial mode. + +
+The follow sections are kept for now but are quite outdated. +
+ +--- [codegen]: backend/codegen.md