Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoScheduler] Remove max_registers_per_block in HardwareParams #7040

Merged
merged 2 commits into from
Dec 5, 2020

Conversation

merrymercy
Copy link
Member

@merrymercy merrymercy commented Dec 5, 2020

Previously, we use hardware_params->max_registers_per_block got from Cuda device query as the value of max_local_memory_per_block in VerifyGPUCode. This is wrong. They are just not the same thing.
Luckily, for NVIDIA GPUs, this bug does not affect the performance. Because kMaxRegistersPerBlock returns a very large value. The check in VerifyGPUCode with this large value almost affects nothing.

We have to rename hardware_params->max_registers_per_block to a correct name hardware_params->max_local_memory_per_block, so it is more meaningful for other backends.

A better way is to set it as INT32_MAX to simply skip this check. Because there is no hard limitation in the CUDA runtime for this value. Setting it to INT32_MAX can enlarge the search space while keeping most of the measured schedules still valid.

@merrymercy merrymercy requested review from comaniac and masahi December 5, 2020 05:47
@merrymercy
Copy link
Member Author

cc @jcf94

@@ -64,8 +64,7 @@ HardwareParams HardwareParamsNode::GetDefaultHardwareParams(const Target& target
device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxSharedMemoryPerBlock, &ret);
int max_shared_memory_per_block = ret;

device_api->GetAttr(ctx, tvm::runtime::DeviceAttrKind::kMaxRegistersPerBlock, &ret);
int max_registers_per_block = ret;
int max_local_memory_per_block = INT32_MAX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment as the PR description.

max_threads_per_block, max_vthread_extent, warp_size);
} else if (target->kind->device_type == kDLMetal) {
// Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf
// This setting looks working for Metal GPUs later than A10
int max_shared_memory_per_block = 32 * 1024;
int max_registers_per_block = 4 * 1024;
int max_local_memory_per_block = INT32_MAX;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@masahi
Copy link
Member

masahi commented Dec 5, 2020

ok I'll update my PR #7038 after we let this in first.

@merrymercy merrymercy force-pushed the pr-fix-hardware-param branch from f9e7def to 26cd727 Compare December 5, 2020 06:46
@merrymercy
Copy link
Member Author

@comaniac Comments are addressed.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masahi masahi merged commit 878a0a9 into apache:main Dec 5, 2020
@masahi
Copy link
Member

masahi commented Dec 5, 2020

thanks @merrymercy @comaniac

@merrymercy merrymercy deleted the pr-fix-hardware-param branch December 5, 2020 11:51
TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
…pache#7040)

* [AutoScheduler] Fix hardware params

* address comments
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
…pache#7040)

* [AutoScheduler] Fix hardware params

* address comments
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
…pache#7040)

* [AutoScheduler] Fix hardware params

* address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants