Skip to content

Commit

Permalink
[js/webgpu] Destroy staging buffers aggressively during weights uploa…
Browse files Browse the repository at this point in the history
…ding (#22726)

In current implementation, all the staging buffers for weights uploading
are destroyed after first batch of kernel execution. It requires a lot
of memory as all the staging buffers couldn't be reused. It also hurts
the startup time (weights uploading only happens in session creation),
as weights uploading is delayed to a very late time.
This PR uses a very aggressive way to submit queue and destroy staging
buffers, so that the related GPU memory could be reused as much as
possible, though the real situation depends on the WebGPU and driver
implementation. The aggressive queue submission also moves GPU
operations to a very early time, which helps the startup time.
Some buffer uploading benchmarks are composed to compare multiple
solutions, regarding to the memory and time consumption. Benchmarks can
be found at
https://github.com/webatintel/webbench/blob/master/webgpu/buffer-upload.html,
while detailed test data can be found at

https://docs.google.com/document/d/1KgygOkb9ZNzkgzQ_tWOGlEI9ScmMBHDjDojjPFLmVXU/edit.
I also tested phi3.5 on 2 machines, first inference time improved from
5141ms to 3579ms and from 4327ms to 2947ms separately.
  • Loading branch information
Yang Gu authored Nov 6, 2024
1 parent 742a0d3 commit 811231e
Showing 1 changed file with 3 additions and 13 deletions.
16 changes: 3 additions & 13 deletions js/web/lib/wasm/jsep/webgpu/gpu-data-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,6 @@ class GpuDataManagerImpl implements GpuDataManager {
// GPU Data ID => GPU Data ( storage buffer )
private storageCache: Map<GpuDataId, StorageCacheValue>;

// pending buffers for uploading ( data is unmapped )
private buffersForUploadingPending: GPUBuffer[];
// pending buffers for computing
private buffersPending: GPUBuffer[];

Expand All @@ -212,7 +210,6 @@ class GpuDataManagerImpl implements GpuDataManager {
this.storageCache = new Map();
this.freeBuffers = new Map();
this.freeUniformBuffers = new Map();
this.buffersForUploadingPending = [];
this.buffersPending = [];
this.capturedPendingBuffers = new Map();

Expand Down Expand Up @@ -252,13 +249,12 @@ class GpuDataManagerImpl implements GpuDataManager {
gpuBufferForUploading.unmap();

// GPU copy
const commandEncoder = this.backend.getCommandEncoder();
this.backend.endComputePass();
const commandEncoder = this.backend.device.createCommandEncoder();
commandEncoder.copyBufferToBuffer(gpuBufferForUploading, 0, gpuDataCache.gpuData.buffer, 0, size);
this.backend.device.queue.submit([commandEncoder.finish()]);
gpuBufferForUploading.destroy();

LOG_DEBUG('verbose', () => `[WebGPU] GpuDataManager.upload(id=${id})`);

this.buffersForUploadingPending.push(gpuBufferForUploading);
}

memcpy(sourceId: GpuDataId, destinationId: GpuDataId): void {
Expand Down Expand Up @@ -395,12 +391,6 @@ class GpuDataManagerImpl implements GpuDataManager {
}

refreshPendingBuffers(): void {
for (const buffer of this.buffersForUploadingPending) {
// upload buffer is only useful in the session creation time. So we don't need to reuse them in session running.
buffer.destroy();
}
this.buffersForUploadingPending = [];

if (this.buffersPending.length === 0) {
return;
}
Expand Down

0 comments on commit 811231e

Please sign in to comment.