Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - improve compile time by type-erasing wgpu structs #5950

Closed
wants to merge 10 commits into from

Conversation

robtfm
Copy link
Contributor

@robtfm robtfm commented Sep 11, 2022

Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with #[derive(SystemParam)] i've been working on).

we can avoid that by boxing and type-erasing in the bevy render_resource wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

Solution

  • add macros to box and type-erase in debug mode
  • leave current impl for release mode

timings:

current      
  Total time: 64.9s  
  bevy_pbr v0.9.0-dev 19.2s  
  bevy_render v0.9.0-dev 17.0s  
  bevy_sprite v0.9.0-dev 15.1s  
  DepthPipelineBuilder 18.7s  
       
with type-erasing     diff
  Total time: 49.0s -24%
  bevy_render v0.9.0-dev 12.0s -38%
  bevy_pbr v0.9.0-dev 8.7s -49%
  bevy_sprite v0.9.0-dev 6.1s -60%
  DepthPipelineBuilder 1.2s -94%

the depth pipeline builder is a binary with body:

use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}

@bjorn3 bjorn3 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times P-Compile-Failure A failure to compile Bevy apps and removed P-Compile-Failure A failure to compile Bevy apps labels Sep 11, 2022
@bjorn3
Copy link
Contributor

bjorn3 commented Sep 11, 2022

I'm curious how this improves compile times. It looks like just as much functions need to be codegened, if not even more for the type erased variants.

@robtfm
Copy link
Contributor Author

robtfm commented Sep 11, 2022

I'm curious how this improves compile times. It looks like just as much functions need to be codegened, if not even more for the type erased variants.

yes it's weird, and probably something that should be improved in the compiler. as a guess, maybe the size for a struct containing generics needs to be recalculated for each generic combination, even when the generic is not a direct member (like in a box or an arc) and doesn't actually affect the size? the wgpu structs contain some very generic- and associated type-heavy code that is probably causing an exponential type explosion.

@mockersf
Copy link
Member

This PR doesn't compile in release on my mac

error[E0515]: cannot return reference to local variable `untyped`
   --> crates/bevy_render/src/render_resource/resource_macros.rs:57:9
    |
57  |         &$value
    |         ^^^^^^^ returns a reference to data owned by the current function
    |
   ::: crates/bevy_render/src/render_resource/pipeline_cache.rs:302:9
    |
302 |         render_resource_ref!(untyped, wgpu::PipelineLayout)
    |         --------------------------------------------------- in this macro invocation
    |
    = note: this error originates in the macro `render_resource_ref` (in Nightly builds, run with -Z macro-backtrace for more info)

@robtfm
Copy link
Contributor Author

robtfm commented Sep 11, 2022

This PR doesn't compile in release on my mac

fixed. will address the other feedback tomorrow if there's positive disposition.

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 12, 2022

This seems to be the main contributing factor for bevy_render compiling faster (70% of perf improvement):

+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| Item                                                | Self Time     | Self Time Change | Time          | Time Change | Item count | Cache hits | Blocked time | Incremental load time | Incremental hashing time |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| evaluate_obligation                                 | -5.14754422s  | -90.82%          | -5.147455645s | -88.89%     | -3939      | +0         | +0ns         | +0ns                  | -1.646728ms              |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+

Additionally it saves ~30MB of ~200MB on the file size of of libbevy_render.rlib. Similar results for bevy_pbr (50% of perf improvement):

+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| Item                                                | Self Time     | Self Time Change | Time          | Time Change | Item count | Cache hits | Blocked time | Incremental load time | Incremental hashing time |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| evaluate_obligation                                 | -8.455474125s | -95.79%          | -8.477643881s | -94.68%     | -8198      | +0         | +0ns         | +0ns                  | -2.736299ms              |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+

And saving ~50MB of ~140MB.

(measurements with single threaded compilation and incremental cache cleared)

@robtfm
Copy link
Contributor Author

robtfm commented Sep 12, 2022

maybe related: https://fasterthanli.me/articles/when-rustc-explodes#what-now

sensitivy check: is BlackBox acceptable for a wrapper that contains a boxed type-erased value?

@@ -0,0 +1,102 @@
#[cfg(debug_assertions)]
#[macro_export]
macro_rules! render_resource_wrapper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth adding some documentation for the rationale behind this wrapper (and the criteria for removing it in the future?)

@MinerSebas
Copy link
Contributor

MinerSebas commented Sep 22, 2022

In addition to the compilation benchmark, a runtime/render benchmark should be performed, so that it is known whether this has any impact here, due to missed optimizations.

Edit: Missed the #[cfg(debug_assertions)] on the macro, to only use this for debug builds. So no render benchmark necessary. 😓

@robtfm
Copy link
Contributor Author

robtfm commented Sep 23, 2022

you sent me down a rabbit hole of newtypes not being zero-cost before i read the edit :) ... apparently newtypes are only zero cost as long as they have a non-calloc style initialization, which would be true here since we're wrapping an arc which needs an initialized counter.

macro_rules! render_resource_wrapper {
($wrapper_type:ident, $wgpu_type:ty) => {
#[derive(Clone, Debug)]
pub struct $wrapper_type(Option<std::sync::Arc<Box<()>>>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the Option here. Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is so that we can take it in try_unwrap without having to do anything special for the Drop impl to work.

If we don't use an option, we have to std::mem::forget and/or ManuallyDrop<> (on self or self.0 or both..?), or std::mem::replace(self.0) (which creates a new Arc<Box<()>> just to drop it again in Drop).

I can do one of those if you prefer, I will need to dig around to get comfortable with it, but since this is #[cfg(debug_assertions)] code perhaps it's not necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense to me. We could solve the try_unwrap problem with another layer of wrapper types (so we can implement drop on the internal type, making it possible to move out on try_unwrap()). But implementing drop for that internal type is still problematic.

@robtfm

This comment was marked as outdated.

pub struct $wrapper_type(Option<std::sync::Arc<Box<()>>>);

impl $wrapper_type {
pub fn new(value: $wgpu_type) -> Self {
Copy link
Contributor

@coreh coreh Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If $wgpu_type is !Send or !Sync, won't this (and other fn's that do mem::transmute) produce potentially unsound results, since () is Send and Sync?

Not sure if it would work, but perhaps something like:

Suggested change
pub fn new(value: $wgpu_type) -> Self {
pub fn new(value: $wgpu_type) -> Self where $wgpu_type: Send + Sync {

Could avoid that someone inadvertently uses this for something that isn't Send + Sync down the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

@cart
Copy link
Member

cart commented Nov 18, 2022

bors r+

bors bot pushed a commit that referenced this pull request Nov 18, 2022
# Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with `#[derive(SystemParam)]` i've been working on).

we can avoid that by boxing and type-erasing in the bevy `render_resource` wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

## Solution

- add macros to box and type-erase in debug mode
- leave current impl for release mode

timings:


<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;}
col
	{mso-width-source:auto;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:Calibri, sans-serif;
	mso-font-charset:0;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
.xl65
	{mso-number-format:0%;}
.xl66
	{vertical-align:middle;
	white-space:normal;}
.xl67
	{vertical-align:middle;}
-->
</head>

<body link="#0563C1" vlink="#954F72">



current |   |   |  
-- | -- | -- | --
  | Total time: | 64.9s |  
  | bevy_pbr v0.9.0-dev | 19.2s |  
  | bevy_render v0.9.0-dev | 17.0s |  
  | bevy_sprite v0.9.0-dev | 15.1s |  
  | DepthPipelineBuilder | 18.7s |  
  |   |   |  
with type-erasing |   |   | diff
  | Total time: | 49.0s | -24%
  | bevy_render v0.9.0-dev | 12.0s | -38%
  | bevy_pbr v0.9.0-dev | 8.7s | -49%
  | bevy_sprite v0.9.0-dev | 6.1s | -60%
  | DepthPipelineBuilder | 1.2s | -94%



</body>

</html>

the depth pipeline builder is a binary with body: 
```rust
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}
```
@bors bors bot changed the title improve compile time by type-erasing wgpu structs [Merged by Bors] - improve compile time by type-erasing wgpu structs Nov 18, 2022
@bors bors bot closed this Nov 18, 2022
taiyoungjang pushed a commit to taiyoungjang/bevy that referenced this pull request Dec 15, 2022
# Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with `#[derive(SystemParam)]` i've been working on).

we can avoid that by boxing and type-erasing in the bevy `render_resource` wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

## Solution

- add macros to box and type-erase in debug mode
- leave current impl for release mode

timings:


<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;}
col
	{mso-width-source:auto;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:Calibri, sans-serif;
	mso-font-charset:0;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
.xl65
	{mso-number-format:0%;}
.xl66
	{vertical-align:middle;
	white-space:normal;}
.xl67
	{vertical-align:middle;}
-->
</head>

<body link="#0563C1" vlink="#954F72">



current |   |   |  
-- | -- | -- | --
  | Total time: | 64.9s |  
  | bevy_pbr v0.9.0-dev | 19.2s |  
  | bevy_render v0.9.0-dev | 17.0s |  
  | bevy_sprite v0.9.0-dev | 15.1s |  
  | DepthPipelineBuilder | 18.7s |  
  |   |   |  
with type-erasing |   |   | diff
  | Total time: | 49.0s | -24%
  | bevy_render v0.9.0-dev | 12.0s | -38%
  | bevy_pbr v0.9.0-dev | 8.7s | -49%
  | bevy_sprite v0.9.0-dev | 6.1s | -60%
  | DepthPipelineBuilder | 1.2s | -94%



</body>

</html>

the depth pipeline builder is a binary with body: 
```rust
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}
```
let inner = self.0.take();
if let Some(inner) = inner {
let _ = unsafe {
std::mem::transmute::<
Copy link
Member

@james7132 james7132 Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially UB. std::sync::Arc is repr(Rust), so we can't be sure that this transmute is sound.

Same thing holds true for the other transmutes in this file.

alradish pushed a commit to alradish/bevy that referenced this pull request Jan 22, 2023
# Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with `#[derive(SystemParam)]` i've been working on).

we can avoid that by boxing and type-erasing in the bevy `render_resource` wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

## Solution

- add macros to box and type-erase in debug mode
- leave current impl for release mode

timings:


<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;}
col
	{mso-width-source:auto;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:Calibri, sans-serif;
	mso-font-charset:0;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
.xl65
	{mso-number-format:0%;}
.xl66
	{vertical-align:middle;
	white-space:normal;}
.xl67
	{vertical-align:middle;}
-->
</head>

<body link="#0563C1" vlink="#954F72">



current |   |   |  
-- | -- | -- | --
  | Total time: | 64.9s |  
  | bevy_pbr v0.9.0-dev | 19.2s |  
  | bevy_render v0.9.0-dev | 17.0s |  
  | bevy_sprite v0.9.0-dev | 15.1s |  
  | DepthPipelineBuilder | 18.7s |  
  |   |   |  
with type-erasing |   |   | diff
  | Total time: | 49.0s | -24%
  | bevy_render v0.9.0-dev | 12.0s | -38%
  | bevy_pbr v0.9.0-dev | 8.7s | -49%
  | bevy_sprite v0.9.0-dev | 6.1s | -60%
  | DepthPipelineBuilder | 1.2s | -94%



</body>

</html>

the depth pipeline builder is a binary with body: 
```rust
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}
```
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with `#[derive(SystemParam)]` i've been working on).

we can avoid that by boxing and type-erasing in the bevy `render_resource` wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

## Solution

- add macros to box and type-erase in debug mode
- leave current impl for release mode

timings:


<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;}
col
	{mso-width-source:auto;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:Calibri, sans-serif;
	mso-font-charset:0;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
.xl65
	{mso-number-format:0%;}
.xl66
	{vertical-align:middle;
	white-space:normal;}
.xl67
	{vertical-align:middle;}
-->
</head>

<body link="#0563C1" vlink="#954F72">



current |   |   |  
-- | -- | -- | --
  | Total time: | 64.9s |  
  | bevy_pbr v0.9.0-dev | 19.2s |  
  | bevy_render v0.9.0-dev | 17.0s |  
  | bevy_sprite v0.9.0-dev | 15.1s |  
  | DepthPipelineBuilder | 18.7s |  
  |   |   |  
with type-erasing |   |   | diff
  | Total time: | 49.0s | -24%
  | bevy_render v0.9.0-dev | 12.0s | -38%
  | bevy_pbr v0.9.0-dev | 8.7s | -49%
  | bevy_sprite v0.9.0-dev | 6.1s | -60%
  | DepthPipelineBuilder | 1.2s | -94%



</body>

</html>

the depth pipeline builder is a binary with body: 
```rust
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}
```
bors bot pushed a commit that referenced this pull request Feb 6, 2023
# Objective

[as noted](#5950 (comment)) by james, transmuting arcs may be UB.
 
we now store a `*const ()` pointer internally, and only rely on `ptr.cast::<()>().cast::<T>() == ptr`.

as a happy side effect this removes the need for boxing the value, so todo: potentially use this for release mode as well
bors bot pushed a commit to bevyengine/bevy-website that referenced this pull request Mar 6, 2023
## How This Works

For the Bevy 0.10 release blog post (and for the first time ever), I'm publicly opening the doors to other people writing blog post sections. Specifically, if you worked on a feature in a substantial way and are interested in presenting it, you can now ask to claim a section by leaving a comment in this PR. If you claim a section, submit a pull request to the `release-0.10.0` branch in this repo. For the next week, we will be filling in sections (the release target is Saturday March 4th). Please don't claim a section if you don't plan on completing it within that timeline. Also don't claim a section if you weren't an active participant in the design and implementation of the change (unless you are a Maintainer or SME).

I will claim any unclaimed sections.

Try to match the style of previous release blog posts as much as possible.

1. Show, don't tell. Don't bombard people with information. Avoid large walls of text _and_ large walls of code. Prefer the pattern "byte sized description of one thing" -> "example code/picture/video contextualizing that one thing" -> repeat. Take readers on a journey step by simple step.
2. Don't use up reader's "mental bandwidth" without good reason. We can't afford page-long descriptions of minor bug fixes. If it isn't a "headliner change", keep the description short and sweet. If a change is self describing, let it do that (ex: We now support this new mesh shape primitive ... this is what it looks like). If it is a "headliner change", still try to keep it reasonable. We always have a lot to cover.
3. In slight competition with point (2), don't omit interesting technical information when it is truly fun and engaging. A good chunk of our users are highly technical and enjoy learning how the sausage is made. Try to strike a balance between "terse and simple" and "nerdy details".
4. When relevant, briefly describe the problem being solved first, then describe the solution we chose. This contextualizes the change and gives the feature value and purpose.
5. When possible, provide visuals. They create interest / keep people hooked / break up the monotony.
6. Record images and videos at the default bevy resolution (1280x720)
7. Provide an accurate listing of authors that meaningfully contributed to the feature. Try to sort in order of "contribution scale". This is hard to define, but try to be fair. When in doubt, ask other contributors, SMEs, and/or maintainers.
8. Provide numbers and graphs where possible.  If something is faster, use numbers to back it up. We don't (yet) have automated graph generation in blog post style, so send data / info to me (@cart) if you want a graph made.

## Headliners

Headliners are our "big ticket high importance / high profile" changes. They are listed briefly at the beginning of the blog post, their entries are roughly sorted "to the top", and they are given priority when it comes to "space in the blog post". If you think we missed something (or didn't prioritize something appropriately), let us know.

* ECS Schedule v3 (previously known as "stageless")  
* Partial Android Support
* Depth and Normal Prepass
* Environment Map Lighting
* Cascaded Shadow Maps
* Distance and Atmospheric Fog
* Smooth Skeletal Animation Transitions
* Enable Parallel Pipelined Rendering
* Windows as Entities
* Renderer Optimizations
* ECS Optimizations

## Sections

These are the sections we will cover in the blog post. If a section has been claimed, it will have `(claimed by X)` in the title. If it is unclaimed it will have `(unclaimed)` in the title. Let us know if we missed a section. We don't cover every feature, but we should cover pretty much everything that would be interesting to users. Note that what is interesting or challenging to implement is not necessarily something that is relevant to our blog post readers. And sometimes the reverse is true!

If you believe a section should be split up or reorganized, just bring it up here and we can discuss it.

### ~~Schedule V3 (claimed by @alice-i-cecile)~~

* [Migrate engine to Schedule v3][7267]
* [Add `bevy_ecs::schedule_v3` module][6587]
* [Stageless: fix unapplied systems][7446]
* [Stageless: move final apply outside of spawned executor][7445]
* Sets
* Base Sets
  * [Base Sets][7466]
* Reporting
  * [Report sets][7756]
  * [beter cycle reporting][7463]
* Run Conditions
  * [Add condition negation][7559]
  * [And/Or][7605]
  * [Add more common run conditions][7579]
* States
  * [States derive macro][7535]
* System Piping Flexibility
  * [Support piping exclusive systems][7023]
  * [Allow piping run conditions][7547]

### ~~Depth and Normal Prepass (claimed by @IceSentry)~~

* [Add depth and normal prepass][6284]
* [Move prepass functions to prepass_utils][7354]

### ~~Distance and Atmospheric Fog (claimed by @coreh)~~

* [Add Distance and Atmospheric Fog support][6412]

### ~~Cascaded Shadow Maps (claimed by @cart)~~

* [Cascaded shadow maps.][7064]
* [Better cascades config defaults + builder, tweak example configs][7456]

### ~~Environment Map Lighting (claimed by @cart)~~

* [EnvironmentMapLight, BRDF Improvements][7051]
* [Webgl2 support][7737]

### ~~Tonemapping options (claimed by @cart)~~

* [Initial tonemapping options][7594]

### ~~Android support + unification (claimed by @mockersf)~~

* [IOS, Android... same thing][7493]

### ~~Windows as Entities (claimed by @Aceeri)~~

* [Windows as Entities][5589]
* [break feedback loop when moving cursor][7298]
* [Fix `Window` feedback loop between the OS and Bevy][7517]

### ~~Enable Parallel Pipelined Rendering (claimed by @james7132)~~

* [Pipelined Rendering][6503]
* [Stageless: add a method to scope to always run a task on the scope thread][7415]
* [Separate Extract from Sub App Schedule][7046]

### ~~Smooth Skeletal Animation Transitions (claimed by @james7132)~~

* [Smooth Transition between Animations][6922]

### ~~Spatial Audio (claimed by @harudagondi)~~

* [Spatial Audio][6028]

### ~~Shader Processor Features (claimed by @cart)~~

* [Shader defs can now have a value][5900]
* [Shaders can now have #else ifdef chains][7431]
* [Define shader defs in shader][7518]

### ~~Shader Flexibility Improvements (claimed by @cart)~~

* [add ambient lighting hook][5428]
* [Refactor Globals and View structs into separate shaders][7512]

### ~~Renderer Optimizations (claimed by @james7132)~~

* [bevy_pbr: Avoid copying structs and using registers in shaders][7069]
* [Flatten render commands][6885]
* [Replace UUID based IDs with a atomic-counted ones][6988]
* [improve compile time by type-erasing wgpu structs][5950]
* [Shrink DrawFunctionId][6944]
* [Shrink ComputedVisibility][6305]
* [Reduce branching in TrackedRenderPass][7053]
* [Make PipelineCache internally mutable.][7205]
* [Improve `Color::hex` performance][6940]
* [Support recording multiple CommandBuffers in RenderContext][7248]
* [Parallelized transform propagation][4775]
* [Introduce detailed_trace macro, use in TrackedRenderPass][7639]
* [Optimize color computation in prepare_uinodes][7311]
* [Directly extract joints into SkinnedMeshJoints][6833]
* [Parallelize forward kinematics animation systems][6785]
* [Move system_commands spans into apply_buffers][6900]
* [Reduce the use of atomics in the render phase][7084]

### ~~ECS Optimizations (claimed by @james7132 )~~

* [Remove redundant table and sparse set component IDs from Archetype][4927]
* [Immutable sparse sets for metadata storage][4928]
* [Replace BlobVec's swap_scratch with a swap_nonoverlapping][4853]
* [Use T::Storage::STORAGE_TYPE to optimize out unused branches][6800]
* [Remove unnecessary branching from bundle insertion][6902]
* [Split Component Ticks][6547]
* [use bevy_utils::HashMap for better performance. TypeId is predefined …][7642]
* [Extend EntityLocation with TableId and TableRow][6681]
* [Basic adaptive batching for parallel quer- [Speed up `CommandQueue` by storing commands more densely][6391]y iteration][4777]

### ~~Reflect Improvements (claimed by @cart)~~

* [bevy_reflect: Add `ReflectFromReflect` (v2)][6245]
* [Add reflection support for VecDeque][6831]
* [reflect: add `insert` and `remove` methods to `List`][7063]
* [Add `remove` method to `Map` reflection trait.][6564]
* [bevy_reflect: Fix binary deserialization not working for unit structs][6722]
* [Add `TypeRegistrationDeserializer` and remove `BorrowedStr`][7094]
* [bevy_reflect: Add simple enum support to reflection paths][6560]
* [Enable deriving Reflect on structs with generic types][7364]
* [bevy_reflect: Support tuple reflection paths][7324]
* [bevy_reflect: Pre-parsed paths][7321]
* [bevy_ecs: ReflectComponentFns without World][7206]

### ~~AsBindGroup Improvements (claimed by @cart)~~

* [Support storage buffers in derive `AsBindGroup`][6129]
* [Support raw buffers in AsBindGroup][7701]

### ~~Cylinder Shape (claimed by @cart)~~

* [Add cylinder shape][6809]

### ~~Subdividable Plane Shape (claimed by @cart)~~

* [added subdivisions to shape::Plane][7546]

### ~~StandardMaterial Blend Modes (claimed by @coreh)~~

* [Standard Material Blend Modes][6644]

### ~~Configurable Visibility Component (claimed by @cart)~~

* [enum `Visibility` component][6320]

### Task Improvements (claimed by @cart)

* [Fix panicking on another scope][6524]
* [Add thread create/destroy callbacks to TaskPool][6561]
* [Thread executor for running tasks on specific threads.][7087]
* [await tasks to cancel][6696]
* [Stageless: move MainThreadExecutor to schedule_v3][7444]
* [Stageless: close the finish channel so executor doesn't deadlock][7448]

### ~~Upgrade to wgpu 0.15 (claimed by @cart)~~

* [Wgpu 0.15][7356]

### ~~Expose Bindless / Non-uniform Indexing Support (claimed by @cart)~~

* [Request WGPU Capabilities for Non-uniform Indexing][6995]

### ~~Cubic Spline (claimed by @aevyrie)~~

* [Bezier][7653]

### ~~Revamp Bloom (claimed by @JMS55)~~

* [Revamp bloom](bevyengine/bevy#6677)

### ~~Use Prepass Shaders for Shadows (claimed by @superdump)~~

* [use prepass shaders for shadows](bevyengine/bevy#7784)

### ~~AccessKit (claimed by @alice-i-cecile)~~

* [accesskit](bevyengine/bevy#6874)

### ~~Camera Output Modes (claimed by @cart)~~

* [camera output modes](bevyengine/bevy#7671)

### ~~SystemParam Improvements (claimed by @JoJoJet)~~

* [Make the `SystemParam` derive macro more flexible][6694]
* [Add a `SystemParam` primitive for deferred mutations; allow `#[derive]`ing more types of SystemParam][6817]

### ~~Gamepad Improvements (claimed by @cart)~~

* [Gamepad events refactor][6965]
* [add `Axis::devices` to get all the input devices][5400]

### ~~Input Methods (claimed by @cart)~~

* [add Input Method Editor support][7325]

### ~~Color Improvements (claimed by @cart)~~

* [Add LCH(ab) color space to `bevy_render::color::Color`][7483]
* [Add a more familiar hex color entry][7060]

### ~~Split Up CorePlugin (claimed by @cart)~~

* [Break `CorePlugin` into `TaskPoolPlugin`, `TypeRegistrationPlugin`, `FrameCountPlugin`.][7083]

### ~~ExtractComponent Derive (claimed by @cart)~~

* [Extract component derive][7399]

### ~~Added OpenGL and DX11 Backends By Default (claimed by @cart)~~

* [add OpenGL and DX11 backends][7481]

### ~~UnsafeWorldCell (claimed by @BoxyUwU)~~

* [Move all logic to `UnsafeWorldCell`][7381]
* [Rename `UnsafeWorldCellEntityRef` to `UnsafeEntityCell`][7568]

### ~~Entity Commands (claimed by @cart)~~

* [Add a trait for commands that run for a given `Entity`][7015]

* [Add an extension trait to `EntityCommands` to update hierarchy while preserving `GlobalTransform`][7024]
* [Add ReplaceChildren and ClearChildren EntityCommands][6035]

### ~~Iterate EntityRef (claimed by @james7132)~~

* [Allow iterating over with EntityRef over the entire World][6843]

### ~~Ref Queries (@JoJoJet)~~

* [Added Ref to allow immutable access with change detection][7097]

### ~~Taffy Upgrade (claimed by @cart)~~

* [Upgrade to Taffy 0.2][6743]

### ~~Relative Cursor Position (claimed by @cart)~~

* [Relative cursor position][7199]

### ~~Const UI Config (claimed by @cart)~~

* [Add const to methods and const defaults to bevy_ui][5542]

### ~~Examples (claimed by @cart)~~

* [Add pixelated Bevy to assets and an example][6408]
* [Organized scene_viewer into plugins for reuse and organization][6936]

### ~~CI Improvements (claimed by @cart)~~

* [add rust-version for MSRV and CI job to check][6852]
* [msrv: only send a message on failure during the actual msrv part][7532]
* [Make CI friendlier][7398]
* [Fix CI welcome message][7428]
* [add an action to ask for a migration guide when one is missing][7507]

### ~~SMEs (@cart)~~

This was already covered in another blog post. Just briefly call out what they are and that this is the first release that used them. Link to the other blog post.

* [Subject Matter Experts and new Bevy Org docs][7185]

[4775]: bevyengine/bevy#4775
[4777]: bevyengine/bevy#4777
[4853]: bevyengine/bevy#4853
[4927]: bevyengine/bevy#4927
[4928]: bevyengine/bevy#4928
[5400]: bevyengine/bevy#5400
[5428]: bevyengine/bevy#5428
[5542]: bevyengine/bevy#5542
[5589]: bevyengine/bevy#5589
[5900]: bevyengine/bevy#5900
[5950]: bevyengine/bevy#5950
[6028]: bevyengine/bevy#6028
[6035]: bevyengine/bevy#6035
[6129]: bevyengine/bevy#6129
[6179]: bevyengine/bevy#6179
[6245]: bevyengine/bevy#6245
[6284]: bevyengine/bevy#6284
[6305]: bevyengine/bevy#6305
[6320]: bevyengine/bevy#6320
[6391]: bevyengine/bevy#6391
[6408]: bevyengine/bevy#6408
[6412]: bevyengine/bevy#6412
[6503]: bevyengine/bevy#6503
[6524]: bevyengine/bevy#6524
[6547]: bevyengine/bevy#6547
[6557]: bevyengine/bevy#6557
[6560]: bevyengine/bevy#6560
[6561]: bevyengine/bevy#6561
[6564]: bevyengine/bevy#6564
[6587]: bevyengine/bevy#6587
[6644]: bevyengine/bevy#6644
[6649]: bevyengine/bevy#6649
[6681]: bevyengine/bevy#6681
[6694]: bevyengine/bevy#6694
[6696]: bevyengine/bevy#6696
[6722]: bevyengine/bevy#6722
[6743]: bevyengine/bevy#6743
[6785]: bevyengine/bevy#6785
[6800]: bevyengine/bevy#6800
[6802]: bevyengine/bevy#6802
[6809]: bevyengine/bevy#6809
[6817]: bevyengine/bevy#6817
[6831]: bevyengine/bevy#6831
[6833]: bevyengine/bevy#6833
[6843]: bevyengine/bevy#6843
[6852]: bevyengine/bevy#6852
[6885]: bevyengine/bevy#6885
[6900]: bevyengine/bevy#6900
[6902]: bevyengine/bevy#6902
[6922]: bevyengine/bevy#6922
[6926]: bevyengine/bevy#6926
[6936]: bevyengine/bevy#6936
[6940]: bevyengine/bevy#6940
[6944]: bevyengine/bevy#6944
[6965]: bevyengine/bevy#6965
[6988]: bevyengine/bevy#6988
[6995]: bevyengine/bevy#6995
[7015]: bevyengine/bevy#7015
[7023]: bevyengine/bevy#7023
[7024]: bevyengine/bevy#7024
[7046]: bevyengine/bevy#7046
[7051]: bevyengine/bevy#7051
[7053]: bevyengine/bevy#7053
[7060]: bevyengine/bevy#7060
[7063]: bevyengine/bevy#7063
[7064]: bevyengine/bevy#7064
[7069]: bevyengine/bevy#7069
[7083]: bevyengine/bevy#7083
[7084]: bevyengine/bevy#7084
[7087]: bevyengine/bevy#7087
[7094]: bevyengine/bevy#7094
[7097]: bevyengine/bevy#7097
[7185]: bevyengine/bevy#7185
[7199]: bevyengine/bevy#7199
[7205]: bevyengine/bevy#7205
[7206]: bevyengine/bevy#7206
[7248]: bevyengine/bevy#7248
[7267]: bevyengine/bevy#7267
[7298]: bevyengine/bevy#7298
[7311]: bevyengine/bevy#7311
[7321]: bevyengine/bevy#7321
[7324]: bevyengine/bevy#7324
[7325]: bevyengine/bevy#7325
[7354]: bevyengine/bevy#7354
[7356]: bevyengine/bevy#7356
[7364]: bevyengine/bevy#7364
[7381]: bevyengine/bevy#7381
[7398]: bevyengine/bevy#7398
[7399]: bevyengine/bevy#7399
[7415]: bevyengine/bevy#7415
[7428]: bevyengine/bevy#7428
[7431]: bevyengine/bevy#7431
[7444]: bevyengine/bevy#7444
[7445]: bevyengine/bevy#7445
[7446]: bevyengine/bevy#7446
[7448]: bevyengine/bevy#7448
[7456]: bevyengine/bevy#7456
[7463]: bevyengine/bevy#7463
[7466]: bevyengine/bevy#7466
[7481]: bevyengine/bevy#7481
[7483]: bevyengine/bevy#7483
[7493]: bevyengine/bevy#7493
[7507]: bevyengine/bevy#7507
[7510]: bevyengine/bevy#7510
[7512]: bevyengine/bevy#7512
[7517]: bevyengine/bevy#7517
[7518]: bevyengine/bevy#7518
[7532]: bevyengine/bevy#7532
[7535]: bevyengine/bevy#7535
[7546]: bevyengine/bevy#7546
[7547]: bevyengine/bevy#7547
[7559]: bevyengine/bevy#7559
[7568]: bevyengine/bevy#7568
[7579]: bevyengine/bevy#7579
[7594]: bevyengine/bevy#7594
[7605]: bevyengine/bevy#7605
[7639]: bevyengine/bevy#7639
[7642]: bevyengine/bevy#7642
[7653]: bevyengine/bevy#7653
[7701]: bevyengine/bevy#7701
[7737]: bevyengine/bevy#7737
[7756]: bevyengine/bevy#7756


Co-authored-by: François <[email protected]>
Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: Mike <[email protected]>
Co-authored-by: Boxy <[email protected]>
Co-authored-by: IceSentry <[email protected]>
Co-authored-by: JoJoJet <[email protected]>
Co-authored-by: Aevyrie <[email protected]>
Co-authored-by: James Liu <[email protected]>
Co-authored-by: Marco Buono <[email protected]>
Co-authored-by: Aceeri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants