Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Allow to customize cooperative group size (tile_size) for static_map #194

Closed
ttnghia opened this issue Jul 23, 2022 · 7 comments
Closed
Labels
P1: Should have Necessary but not critical type: feature request New feature request

Comments

@ttnghia
Copy link

ttnghia commented Jul 23, 2022

Currently, static_map internally sets a fixed number tile_size = 4. Such tile_size value is used when calling the insert or contains APIs. The value tile_size = 4 is not an optimal one, and may cause performance regression on some (if not most) systems as I have tested myself. For example, setting tile_size = 2 would double the performance when running on my system.

It would be great if we can have a way to specify tile_size upon constructing the static_map object, similar to when we construct a static_multimap.

@ttnghia ttnghia added the type: feature request New feature request label Jul 23, 2022
@PointKernel PointKernel added the P1: Should have Necessary but not critical label Jul 25, 2022
@sleeepyjack
Copy link
Collaborator

Probably a candidate for #110. We could also explore having dynamic CG sizes, e.g., CG=1 for when the table occupancy is low and then use a wider group once the table fills up.

@sleeepyjack
Copy link
Collaborator

sleeepyjack commented Jul 26, 2022

@ttnghia I'm curious, what architecture did you run your benchmarks on?

@ttnghia
Copy link
Author

ttnghia commented Jul 26, 2022

I'm running on RTX Quadro 6000, SM75.

@jrhemstad
Copy link
Collaborator

@sleeepyjack I'm guessing it's a difference of GDDR vs HBM. Larger tile_size is better on HBM vs GDDR.

@sleeepyjack
Copy link
Collaborator

sleeepyjack commented Jul 27, 2022

@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?

@jrhemstad
Copy link
Collaborator

@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not?

That wouldn't be too hard. It would be similar to how CUB does its device specific tuning policies.

@PointKernel
Copy link
Member

Completed in the new implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: Should have Necessary but not critical type: feature request New feature request
Projects
None yet
Development

No branches or pull requests

4 participants