Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: dealing with data too large for a single buffer #6138

Draft
wants to merge 27 commits into
base: trunk
Choose a base branch
from

Conversation

alphastrata
Copy link

@alphastrata alphastrata commented Aug 20, 2024

Connections
Link to the issues addressed by this PR, or dependent PRs in other repositories
discussion
thread on matrix

Description
The aim of this new example is to demonstrate taking a large input dataset, splitting it into chunks for the purpose of moving it onto the GPU, but then treating it as a single contiguous data structure once on the GPU.

Testing
There's a single test, that runs the same call made in the example itself, which allocates what should be two buffers (on the CI GPU) worth of 0s, and then has the gpu add 1 to them.
lengths of the input and output are asserted to be equal.
contents of the returned array are asserted to all be 1.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@alphastrata alphastrata marked this pull request as ready for review August 20, 2024 22:28
@alphastrata alphastrata requested a review from a team as a code owner August 20, 2024 22:28
@alphastrata alphastrata changed the title DRAFT: dealing with data too large for a single buffer dealing with data too large for a single buffer Aug 22, 2024
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long wait time for a review!

Frankly as it exists right now, we cannot accept this example. While it physically shows one strategy for dealing with large data sets, after reading it, the user doesn't get a good idea of why that strategy should be used and what problems they are avoiding, compared to the more naive strategy of using larger and larger buffers. Through inline code comments and verbiage in the readme, the reader who has no idea about any of these topics (or even the details of memory allocation) should be able to understand why this is an effective strategy to utilize.

Some things I think it should touch on:

  • Large buffers may fail to allocate due to fragmentation
  • Growing/shrinking a dataset with a buffer system requires copying the entire buffer contents, whereas pagenated data just requires rebuilding a bind group.

I'm not going to close this, as I do think this can be transformed into something that would be great to have.

Added a few incidental comments.

@@ -0,0 +1 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty file? This example definitely needs tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use a bit of advice on the test(s) for this: the included shader relies on traversing the contiguous array (made from the multiple buffers) via an OFFSET currently it needs to know this information ahead of compile time, hence the consts.

const OFFSET: u32 = 1u << 8u; 
const BUFF_LENGTH: u32 = 1u << 25u;
const NUM_BUFFERS: u32 = 2u;
const TOTAL_SIZE: u32 = BUFF_LENGTH * NUM_BUFFERS;

This is of course assuming these values would make sense for the test environment, I've chosen these values based on looking at the test output from previous runs.

I do not know of a better way in wgsl to assign suitable values to these dynamically at runtime, as opposed to what I've done here.

If any maintainers know a cleaner, or more idiomatic way that we could tweak the shader to avoid doing this, that'd be awesome.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fragmentation and pagination notes:

  1. for the fragmentation is a warning similar to what's mentioned in the doc-string on wgpu::Limits enough? We could link to some external docs, perhaps https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.4/linux/sdk/common/topics/graphics_content/avoiding_memory_fragmentation.html

  2. Perhaps a pagination example is also required that this example could link to? [I would be interested in working on that]

examples/src/big_compute_buffers/README.md Outdated Show resolved Hide resolved
@alphastrata
Copy link
Author

Sorry for the long wait time for a review!

Frankly as it exists right now, we cannot accept this example. While it physically shows one strategy for dealing with large data sets, after reading it, the user doesn't get a good idea of why that strategy should be used and what problems they are avoiding, compared to the more naive strategy of using larger and larger buffers. Through inline code comments and verbiage in the readme, the reader who has no idea about any of these topics (or even the details of memory allocation) should be able to understand why this is an effective strategy to utilize.

Some things I think it should touch on:

  • Large buffers may fail to allocate due to fragmentation
  • Growing/shrinking a dataset with a buffer system requires copying the entire buffer contents, whereas pagenated data just requires rebuilding a bind group.

I'm not going to close this, as I do think this can be transformed into something that would be great to have.

Added a few incidental comments.

Cheers, I'll keep working on it.

@alphastrata alphastrata changed the title dealing with data too large for a single buffer Draft: dealing with data too large for a single buffer Sep 29, 2024
@alphastrata alphastrata marked this pull request as draft September 29, 2024 00:10
Copy link
Author

@alphastrata alphastrata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test.

Asked some questions.

Attempted to simplify.

@@ -0,0 +1 @@

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use a bit of advice on the test(s) for this: the included shader relies on traversing the contiguous array (made from the multiple buffers) via an OFFSET currently it needs to know this information ahead of compile time, hence the consts.

const OFFSET: u32 = 1u << 8u; 
const BUFF_LENGTH: u32 = 1u << 25u;
const NUM_BUFFERS: u32 = 2u;
const TOTAL_SIZE: u32 = BUFF_LENGTH * NUM_BUFFERS;

This is of course assuming these values would make sense for the test environment, I've chosen these values based on looking at the test output from previous runs.

I do not know of a better way in wgsl to assign suitable values to these dynamically at runtime, as opposed to what I've done here.

If any maintainers know a cleaner, or more idiomatic way that we could tweak the shader to avoid doing this, that'd be awesome.

@@ -0,0 +1 @@

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the fragmentation and pagination notes:

  1. for the fragmentation is a warning similar to what's mentioned in the doc-string on wgpu::Limits enough? We could link to some external docs, perhaps https://developer.nvidia.com/docs/drive/drive-os/archives/6.0.4/linux/sdk/common/topics/graphics_content/avoiding_memory_fragmentation.html

  2. Perhaps a pagination example is also required that this example could link to? [I would be interested in working on that]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants