[AutoScheduler] Remove `max_registers_per_block` in HardwareParams #7040

merrymercy · 2020-12-05T05:45:14Z

Previously, we use hardware_params->max_registers_per_block got from Cuda device query as the value of max_local_memory_per_block in VerifyGPUCode. This is wrong. They are just not the same thing.
Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because kMaxRegistersPerBlock returns a very large value. The check in VerifyGPUCode with this large value almost affects nothing.

We have to rename hardware_params->max_registers_per_block to a correct name hardware_params->max_local_memory_per_block, so it is more meaningful for other backends.

A better way is to set it as INT32_MAX to simply skip this check. Because there is no hard limitation in the CUDA runtime for this value. Setting it to INT32_MAX can enlarge the search space while keeping most of the measured schedules still valid.

merrymercy · 2020-12-05T05:54:00Z

cc @jcf94

comaniac · 2020-12-05T05:57:44Z

src/auto_scheduler/search_task.cc

@@ -64,8 +64,7 @@ HardwareParams HardwareParamsNode::GetDefaultHardwareParams(const Target& target
    device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxSharedMemoryPerBlock, &ret);
    int max_shared_memory_per_block = ret;

-    device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxRegistersPerBlock, &ret);
-    int max_registers_per_block = ret;
+    int max_local_memory_per_block = INT32_MAX;


Add a comment as the PR description.

comaniac · 2020-12-05T05:57:57Z

src/auto_scheduler/search_task.cc

                          max_threads_per_block, max_vthread_extent, warp_size);
  } else if (target->kind->device_type == kDLMetal) {
    // Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
    // This setting looks working for Metal GPUs later than A10
    int max_shared_memory_per_block = 32 * 1024;
-    int max_registers_per_block = 4 * 1024;
+    int max_local_memory_per_block = INT32_MAX;


masahi · 2020-12-05T05:59:47Z

ok I'll update my PR #7038 after we let this in first.

merrymercy · 2020-12-05T06:48:18Z

@comaniac Comments are addressed.

comaniac

LGTM

masahi · 2020-12-05T10:21:30Z

thanks @merrymercy @comaniac

…pache#7040) * [AutoScheduler] Fix hardware params * address comments

[AutoScheduler] Fix hardware params

ab2d432

merrymercy requested review from comaniac and masahi December 5, 2020 05:47

comaniac requested changes Dec 5, 2020

View reviewed changes

masahi mentioned this pull request Dec 5, 2020

[ROCm][Auto scheduler] Support Auto scheduler and NHWC convolution on ROCm #7038

Merged

address comments

26cd727

merrymercy force-pushed the pr-fix-hardware-param branch from f9e7def to 26cd727 Compare December 5, 2020 06:46

comaniac approved these changes Dec 5, 2020

View reviewed changes

masahi approved these changes Dec 5, 2020

View reviewed changes

masahi merged commit 878a0a9 into apache:main Dec 5, 2020

merrymercy deleted the pr-fix-hardware-param branch December 5, 2020 11:51

TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021

[AutoScheduler] Remove max_registers_per_block in HardwareParams (a…

455ac2c

…pache#7040) * [AutoScheduler] Fix hardware params * address comments

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021

[AutoScheduler] Remove max_registers_per_block in HardwareParams (a…

83da23f

…pache#7040) * [AutoScheduler] Fix hardware params * address comments

electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021

[AutoScheduler] Remove max_registers_per_block in HardwareParams (a…

3b7d726

…pache#7040) * [AutoScheduler] Fix hardware params * address comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoScheduler] Remove `max_registers_per_block` in HardwareParams #7040

[AutoScheduler] Remove `max_registers_per_block` in HardwareParams #7040

merrymercy commented Dec 5, 2020 •

edited

Loading

merrymercy commented Dec 5, 2020

comaniac Dec 5, 2020

comaniac Dec 5, 2020

masahi commented Dec 5, 2020

merrymercy commented Dec 5, 2020

comaniac left a comment

masahi commented Dec 5, 2020

[AutoScheduler] Remove max_registers_per_block in HardwareParams #7040

[AutoScheduler] Remove max_registers_per_block in HardwareParams #7040

Conversation

merrymercy commented Dec 5, 2020 • edited Loading

merrymercy commented Dec 5, 2020

comaniac Dec 5, 2020

Choose a reason for hiding this comment

comaniac Dec 5, 2020

Choose a reason for hiding this comment

masahi commented Dec 5, 2020

merrymercy commented Dec 5, 2020

comaniac left a comment

Choose a reason for hiding this comment

masahi commented Dec 5, 2020

[AutoScheduler] Remove `max_registers_per_block` in HardwareParams #7040

[AutoScheduler] Remove `max_registers_per_block` in HardwareParams #7040

merrymercy commented Dec 5, 2020 •

edited

Loading