Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIR-V: prune unreachable merge block and continue target #1943

Merged
merged 1 commit into from
Nov 3, 2019

Conversation

dneto0
Copy link
Contributor

@dneto0 dneto0 commented Oct 22, 2019

Within a function, only emit a block if:

  • it's reachable from the entry block by zero or more branches,
  • it's an unreachable merge block, in which case it's just OpLabel and OpUnreachable
  • it's an unreachable continue target, in which case it's OpLabel and an unconditional branch to the corresponding loop header.

Includes tests showing intersting cases.
This is WIP. I haven't updated the existing tests that are affected by this change.

@dneto0
Copy link
Contributor Author

dneto0 commented Oct 22, 2019

Existing tests affected:

[==========] 1372 tests from 54 test suites ran. (4038 ms total)
[ PASSED ] 1354 tests.
[ FAILED ] 18 tests, listed below:
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_constantbuffer_frag, where GetParam() = 16-byte object <09-B3 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_doLoop_frag, where GetParam() = 16-byte object <BB-B3 9D-00 01-00 00-00 08-AF 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_forLoop_frag, where GetParam() = 16-byte object <34-B5 9D-00 01-00 00-00 08-AF 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_if_frag, where GetParam() = 16-byte object <64-B8 9D-00 01-00 00-00 08-AF 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_structbuffer_frag, where GetParam() = 16-byte object <C3-C6 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_structbuffer_coherent_frag, where GetParam() = 16-byte object <54-C7 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslCompileTest.FromFile/hlsl_structbuffer_rw_frag, where GetParam() = 16-byte object <E7-C7 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00>
[ FAILED ] ToSpirv/HlslVulkan1_1CompileTest.FromFile/hlsl_wavequery_frag, where GetParam() = 16-byte object <E8-CB 9D-00 01-00 00-00 08-AF 9D-00 01-00 00-00>
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_for_notest_vert, where GetParam() = "spv.for-notest.vert"
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_controlFlowAttributes_frag, where GetParam() = "spv.controlFlowAttributes.frag"
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_earlyReturnDiscard_frag, where GetParam() = "spv.earlyReturnDiscard.frag"
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_forwardFun_frag, where GetParam() = "spv.forwardFun.frag"
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_functionCall_frag, where GetParam() = "spv.functionCall.frag"
[ FAILED ] Glsl/CompileVulkanToSpirvTest.FromFile/spv_merge_unreachable_frag, where GetParam() = "spv.merge-unreachable.frag"
[ FAILED ] ToSpirv/RemapTest.FromFile/remap_similar_1a_none_frag, where GetParam() = 24-byte object <D8-F5 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00 00-00 00-00 00-00 00-00>
[ FAILED ] ToSpirv/RemapTest.FromFile/remap_similar_1b_none_frag, where GetParam() = 24-byte object <F3-F5 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00 00-00 00-00 00-00 00-00>
[ FAILED ] ToSpirv/RemapTest.FromFile/remap_similar_1a_everything_frag, where GetParam() = 24-byte object <0E-F6 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00 00-00 00-00 FF-00 00-00>
[ FAILED ] ToSpirv/RemapTest.FromFile/remap_similar_1b_everything_frag, where GetParam() = 24-byte object <2F-F6 9D-00 01-00 00-00 F3-AE 9D-00 01-00 00-00 00-00 00-00 FF-00 00-00>

@johnkslang
Copy link
Member

I generated the test changes locally (easier to include them and have them here to see and comment on). Most look okay, but some are losing test cases now:

  • hlsl.doLoop.frag: There is a do { return; } while () in the middle. I expected to see XXXX disappear in do { return; XXXX; } while (), but not the entire rest of the shader
  • hlsl.forLoop.frag, similar problem started with for (;;) ;
  • spv.controlFlowAttributes.frag: also for (;;) { }
  • hlsl.if.frag: probably need to just move the if () return; else return; to be the last thing.

Would like confirmation that this change should not cross the line into being an actual optimization. I think it's not, because at the physical CFG level there actually were no branches to the beginning of the code that disappeared. Yes?

It will be required though to change the test sources a bit such that they don't have most their results eliminated by this change. I think I identified the necessary changes above.

Copy link
Member

@johnkslang johnkslang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a couple questions and then nits about MSVC and indentation.

Block* merge = merge_and_header.first;
merge->forceDeadMerge();
}
for (auto continue_and_header : headerForUnreachableContinue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This : is one of those features not initially supported in MSVC with C++11. There is another PR wanting to add = default, which is a similar situation. It's pretty trivial to not do this, but then maybe it's time to try again and see if anyone still complains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. I'll fix in a subsequent patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the next commit by using iterator iteration.

SPIRV/spvIR.h Outdated
// The different reasons for reaching a block in the inReadableOrder traversal.
typedef enum ReachReason {
// Reachable from the entry block via transfers of control, i.e. branches.
ReachViaControlFlow = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four-space indent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in the next commit.

SPIRV/spvIR.h Outdated
// as necessary. A canonical dead merge block has only an OpLabel and an
// OpUnreachable.
void forceDeadMerge() {
assert(localVariables.empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks like 7-space indentation; probably you intended 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the next commit.

return;
o = 3;
}
// This is considered reachable since we don't assume
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying it stronger: we should be quite clear this PR is not an optimization and has no smarts, other than taking away something that was not hooked up.

Is that all true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the intent.

SPIRV/SpvPostProcess.cpp Show resolved Hide resolved
@dneto0
Copy link
Contributor Author

dneto0 commented Oct 28, 2019

I added an option to get back to the old behaviour.
I added tests for both behaviours.

@dneto0 dneto0 changed the title WIP: avoid emitting dead code where possible avoid emitting dead code where possible Oct 28, 2019
@dneto0 dneto0 changed the title avoid emitting dead code where possible SPIR-V: prune unreachable merge block and continue target Oct 28, 2019
@dneto0
Copy link
Contributor Author

dneto0 commented Oct 28, 2019

I think this is ready for review now.
I squashed the history to avoid ugliness.

@johnkslang
Copy link
Member

I added an option to get back to the old behaviour.

I really hope this is non-modal. There should be the one right way to do this, with one set of test results.

The amount of change seemed fine earlier (haven't caught up with the latest round of changes).

@dneto0
Copy link
Contributor Author

dneto0 commented Oct 29, 2019

I really hope this is non-modal. There should be the one right way to do this, with one set of test results.

Point taken.

I confirmed that for the top-of-tree master branch of Vulkan CTS, all the shaders compile and pass SPIR-V validation.

I'm continuing to investigate.

@dneto0
Copy link
Contributor Author

dneto0 commented Oct 29, 2019

The Vulkan CTS tests has had targeted OpUnreachable tests since November 2015, or before Vulkan 1.0. See this version of the SPIR-V assembly-based tests: https://github.com/KhronosGroup/VK-GL-CTS/blob/6632073b4f79df50f93b27c3d2ed005d21f2c367/external/vulkancts/modules/vulkan/spirv_assembly/vktSpvAsmInstructionTests.cpp#L315

This interesting different cases include:

I'm attaching a .csv file that analyzes the test diffs in the current version of the patch.
They fall into the following cases:

  • nonvoid fn: tested as above
  • restructured: Original GLSL rewritten to avoid wiping away most of the codegen output. Generally rewrite by calling out to functions, each with their own isolated case, e.g. infinite loop.
  • branch-to-unreach: tested as above
  • "nonvoid fn: id rename" Like nonvoid vn, but one branch ID is changed by spirv-remap
  • new test: A test added for this MR.

glslang-pruning-dead-code-cases.csv.txt

I'm pretty happy with the existing test coverage. So I'll update the PR to make the new behaviour the one and only behaviour. That will simplify the code and testing.

More aggressively prune unreachable code as follows.
When no control flow edges reach a merge block or continue target:
- delete their contents so that:
  - a merge block becomes OpLabel, then OpUnreachable
  - a continue target becomes OpLabel, then an OpBranch back to the
    loop header
- any basic block which is dominated by such a merge block or continue
  target is removed as well.
- decorations targeting the removed instructions are removed.

Enables the SPIR-V builder post-processing step the GLSLANG_WEB case.
@dneto0
Copy link
Contributor Author

dneto0 commented Oct 29, 2019

I think this is ready review, and that questions have hopefully addressed.

@johnkslang
Copy link
Member

We should bump up the GetSpirvGeneratorVersion() since code gen is changing.

Also, tag, etc.

I will do these, before merging.

// based on the resulting SPIR-V.
// Note: WebGPU code generation must have the opportunity to aggressively
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned on lots of code having to do with types and capabilities that should not be relevant to WebGPU.

Don't we need to just turn on the CFG topology part?

I can look at that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, just reply with the specific subset needed by WebGPU, and I'll do the experiment/pruning of other things and verify how size is impacted.

(First, I'll merge this, and then do that as a second step to get space back for WebGPU.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to just turn on the CFG topology part?

In principle, yes.

But when I looked more deeply at Builder::postProcess, I became worried that the WEB path was missing functionality, particularly around adding per-instruction capabilities and extensions.
Looking again, at the moment those additional things are only affecting:

  • 8bit and 16bit storage
  • physical storage
    Neither of these are in the WEB variant for now, so that could be saved.
    I can take on getting that code space back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted #1968 to reclaim the codespace wasted by this part of the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like it solves it, thanks.

@@ -268,10 +299,24 @@ class Block {
bool unreachable;
};

// The different reasons for reaching a block in the inReadableOrder traversal.
typedef enum ReachReason {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of typedef here without also declaring a name? I'm getting warnings on this have no purpose. Does it work just as well to remove the typedef?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I removed the typedef.

@johnkslang
Copy link
Member

I'm going to push this now, with some modifications, including new test results due to bumping up the generator version number.

@johnkslang johnkslang merged commit 8c3d5b4 into KhronosGroup:master Nov 3, 2019
dneto0 added a commit to dneto0/glslang that referenced this pull request Nov 7, 2019
The SPIR-V post-processing to discover capabilities and
extensions does not apply to WebGPU compilation.  So don't include
that code.

This reclaims some of the code space added by KhronosGroup#1943
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants