[webgpu] Update shader to support non module-level scoping function #6918

haoyunfeix · 2022-10-09T01:33:19Z

FIXES #6842
Declare user defined function before entry point function to support shader translation library which does not implement module scoping yet, like naga(gfx-rs/naga#2075)

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

This change is

hujiajie · 2022-10-09T03:29:45Z

Can we add a generator function foobar() in webgpu_program.ts, call the first step of foobar() in getMainHeaderString() and get a string like fn main(), then after getUserCode() returns, call the second step of foobar() to get the matching definition of _start()?

haoyunfeix · 2022-10-11T07:52:28Z

@qjia7 @xhcao @axinging @hujiajie @gyagp PTAL

gyagp · 2022-10-11T11:56:24Z

tfjs-backend-webgpu/src/matmul_packed_webgpu.ts

    let localRow = i32(localId.y);
    let tileRow = ${isVectorA ? '0' : 'localRow * RowPerThread'};
    let tileCol = i32(localId.x);

    let globalRow = ${isVectorA ? '0' : 'i32(globalId.y) * RowPerThread'};
    let globalCol = i32(globalId.x);
    let batch = ${splitK ? '0' : 'i32(globalId.z)'};
-    let globalRowStart = i32(workgroupId.y) * ${tileAOuter};
+    let globalRowStart = i32(workGroupId.y) * ${tileAOuter};


workgroup should be a single word?

gyagp · 2022-10-11T12:10:07Z

tfjs-backend-webgpu/src/scatter_webgpu.ts

@@ -49,7 +50,7 @@ export class ScatterProgram implements WebGPUProgram {
    this.shaderKey = `scatter_${indicesRank}_${updatesRank}_${
        this.sliceDimGreaterThanOne}_${outputDtype}_${sumDupeIndices}`;
    const stridesType = getCoordsDataType(strides.length);
-    this.uniforms = `sliceDim : i32, strides: ${stridesType}, size: i32,`;
+    this.uniforms = `sliceDim : i32, strides: ${stridesType}, sizeUpdate: i32,`;


sizeUpdate -> updateSize?

Use updatesSize since the buffer name is updates.

qjia7

program.size seems not a good name. But we can do the renaming in a separate PR.

qjia7 · 2022-10-12T04:46:35Z

tfjs-backend-webgpu/src/depthwise_conv2d_nchw_shared_webgpu.ts

@@ -116,11 +112,12 @@ export class DepthwiseConv2DNCHWSharedProgram implements WebGPUProgram {
        }

        // Load one tile of W into local memory.
-        var wIndex = localIndex;
+        var wIndex = localIndexI;


How about var wIndex = i32(localIndex);? Then you can remove L89?

Yes, updated.

qjia7 · 2022-10-12T04:50:05Z

tfjs-backend-webgpu/src/scatter_webgpu.ts

@@ -49,7 +50,7 @@ export class ScatterProgram implements WebGPUProgram {
    this.shaderKey = `scatter_${indicesRank}_${updatesRank}_${
        this.sliceDimGreaterThanOne}_${outputDtype}_${sumDupeIndices}`;
    const stridesType = getCoordsDataType(strides.length);
-    this.uniforms = `sliceDim : i32, strides: ${stridesType}, size: i32,`;
+    this.uniforms = `sliceDim : i32, strides: ${stridesType}, sizeUpdate: i32,`;


Use updatesSize since the buffer name is updates.

qjia7 · 2022-10-12T05:13:32Z

tfjs-backend-webgpu/src/webgpu_program.ts

@@ -144,6 +124,26 @@ export function getMainHeaderString(...params: string[]): string {
  return snippet;
 }

+export function getStartHeaderString(isUseIndex: boolean): string {


rename: isUseIndex -> useGlobalIndex?

qjia7 · 2022-10-12T05:14:36Z

tfjs-backend-webgpu/src/webgpu_program.ts

      var<private> globalId: vec3<u32>;
      var<private> numWorkgroups: vec3<u32>;
+      var<private> workGroupId: vec3<u32>;


workGroupId -> workgroupId

qjia7 · 2022-10-12T05:17:02Z

tfjs-backend-webgpu/src/webgpu_program.ts

@@ -198,6 +198,7 @@ function makeShader(
      prefixSnippets.join('\n'),
      getCoordsFromIndexSnippet(outputData.shape),
      program.getUserCode(),
+      getStartHeaderString(program.size),


Maybe just use getStartHeaderString(true) so that you don't need to change from_pixels_webgpu.ts?

A second thought, I prefer we use whether it's a flat dispatch layout to be as the parameter of getStartHeaderString, which is easier to understand. And same for L283.

const isFlatDispatchLayout = (program.dispatchLayout.y === null && program.dispatchLayout.z === null) || (program.dispatchLayout.y.length === 0 && program.dispatchLayout.z.length ===0) getStartHeaderString(isFlatDispatchLayout )

Add a function to do this, PTAL.

qjia7 · 2022-10-12T05:18:01Z

tfjs-backend-webgpu/src/webgpu_program.ts

-               return i32((workGroupID.z * numWorkgroups.x * numWorkgroups.y +
-                   workGroupID.y * numWorkgroups.x + workGroupID.x) *
+               return i32((workGroupId.z * numWorkgroups.x * numWorkgroups.y +
+                   workGroupId.y * numWorkgroups.x + workGroupId.x) *
                   (workGroupSizeX * workGroupSizeY * workGroupSizeZ) +
                   localInvocationIndex);


The localInvocationIndex can also be replaced by localIndex?

Yes, that may also reduce computing cost.

FIXES tensorflow#6842 To support shader translation library which does not implement module scoping like naga

use main() to generate user function and getStartHenderString() to make entry point function

And address comments

gyagp

LGTM. We need another PR to change all the occurrences of workgroup to a single word.

qjia7

LGTM with one nit.

qjia7 · 2022-10-13T00:36:29Z

tfjs-backend-webgpu/src/depthwise_conv2d_webgpu.ts

              }
            }
+            ${biasActivationSnippet(this.addBias, this.activation)}
+          if (coordsInBounds4D(coords, uniforms.outShape)) {


Since you already used if (index < uniforms.size), if (coordsInBounds4D(coords, uniforms.outShape)) is not needed anymore.

Thanks, updated.

gyagp reviewed Oct 11, 2022

View reviewed changes

qjia7 reviewed Oct 12, 2022

View reviewed changes

haoyunfeix added 3 commits October 12, 2022 17:18

[webgpu] Update shader to support non module-level scoping function

1f5c878

FIXES tensorflow#6842 To support shader translation library which does not implement module scoping like naga

Unify kerels

a10518b

use main() to generate user function and getStartHenderString() to make entry point function

Use isFlatPatchLayout to determine main header

c2c070d

And address comments

haoyunfeix force-pushed the webgpu_upd_sdr_func branch from ca06402 to c2c070d Compare October 12, 2022 09:18

gyagp approved these changes Oct 12, 2022

View reviewed changes

qjia7 approved these changes Oct 13, 2022

View reviewed changes

haoyunfeix added 2 commits October 13, 2022 13:46

remove unnecessary scope checking

fcb246e

Merge branch 'master' into webgpu_upd_sdr_func

0049383

qjia7 merged commit 6b17a42 into tensorflow:master Oct 13, 2022

haoyunfeix mentioned this pull request Oct 13, 2022

LayersModel#predict() results in all zeros when using WebGPU backend in Deno #6842

Closed

hujiajie mentioned this pull request Nov 3, 2022

webgpu: support bincount and denseBincount operators #6994

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu] Update shader to support non module-level scoping function #6918

[webgpu] Update shader to support non module-level scoping function #6918

haoyunfeix commented Oct 9, 2022 •

edited by nsthorat

Loading

hujiajie commented Oct 9, 2022

haoyunfeix commented Oct 11, 2022

gyagp Oct 11, 2022

haoyunfeix Oct 12, 2022

gyagp Oct 11, 2022

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

qjia7 left a comment

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

qjia7 Oct 12, 2022

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

qjia7 Oct 12, 2022

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

qjia7 Oct 12, 2022

haoyunfeix Oct 12, 2022

gyagp left a comment

qjia7 left a comment

qjia7 Oct 13, 2022

haoyunfeix Oct 13, 2022

[webgpu] Update shader to support non module-level scoping function #6918

[webgpu] Update shader to support non module-level scoping function #6918

Conversation

haoyunfeix commented Oct 9, 2022 • edited by nsthorat Loading

hujiajie commented Oct 9, 2022

haoyunfeix commented Oct 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qjia7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gyagp left a comment

Choose a reason for hiding this comment

qjia7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haoyunfeix commented Oct 9, 2022 •

edited by nsthorat

Loading