Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dev][TL] Integrate TL Dequant Implementation into BitBLAS OPs #214

Merged
merged 49 commits into from
Oct 7, 2024

Conversation

LeiWang1999
Copy link
Contributor

This pull request includes significant changes to the bitblas library, mainly focusing on the addition of new schedulers, improvements to matrix multiplication operations, and updates to testing and dependencies. The most important changes are grouped into themes below.

New Schedulers and Enhancements:

  • Added support for float16 and int8 target data types in get_lop3_intrin_group function in bitblas/gpu/intrin/lop3.py. ([bitblas/gpu/intrin/lop3.pyR1680-R1690](https://github.com/microsoft/BitBLAS/pull/214/files#diff-15fd74b90c3b956e9864e35778f26b27f6c9a7cfae35037967f420fda9a0bbe5R1680-R1690))
  • Introduced a new scheduler for weight dequantization in bitblas/ops/general_matmul/__init__.py and updated the _select_scheduler method to return this new scheduler. ([[1]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edR15), [[2]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edL594-R614))
  • Added a new MatmulDequantizeScheduler and related functions in bitblas/ops/general_matmul/tilelang/dequantize/__init__.py. ([bitblas/ops/general_matmul/tilelang/dequantize/__init__.pyR3-R102](https://github.com/microsoft/BitBLAS/pull/214/files#diff-422bb6fd30915da2280e418fe97aab5bdf246b548321577b128cd1652bd68ec2R3-R102))

Code Simplification and Refactoring:

  • Refactored the main function in bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py for better readability and maintainability. ([[1]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L515-R522), [[2]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L652-R662), [[3]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L686-R700), [[4]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L867-R884), [[5]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L979-R999), [[6]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353L1009-R1033))
  • Updated the __repr__ method in bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py to include warp_M and warp_N details. ([bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.pyR320-R321](https://github.com/microsoft/BitBLAS/pull/214/files#diff-eacc57f40c9e810b3503e297685ed5eb8372c922201e18a4be1c1c2e20c93353R320-R321))

Testing Enhancements:

  • Added new test functions matmul_torch_forward and matmul_torch_forward_dequant to testing/python/operators/test_general_matmul_ops_backend_tl.py for validating matrix multiplication operations with and without dequantization. ([[1]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-cb0d29b36116f888480a1c2a5cee67a69ad3d5434522ce51fb92731242adc2cfR78-R227), [[2]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-cb0d29b36116f888480a1c2a5cee67a69ad3d5434522ce51fb92731242adc2cfR248-R259))
  • Included MatmulDequantizeScheduler in the imports of testing/python/operators/test_general_matmul_tilelang_kernel.py. ([testing/python/operators/test_general_matmul_tilelang_kernel.pyR13-R15](https://github.com/microsoft/BitBLAS/pull/214/files#diff-052c67c47659338f3612e7c47384c54f88929e9b32c40d9e797b3f5307ff3896R13-R15))

Dependency Updates:

  • Updated yapf version in requirements-dev.txt and requirements-test.txt to 0.40.2. ([[1]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-2b4945591edfeaa4cf4d3f155e66d4b43d1bda7a55d881d5cf3107f1b05abbbcL2-R2), [[2]](https://github.com/microsoft/BitBLAS/pull/214/files#diff-685da804fbcac569d75387e475e57d1de687a54c6c41b3aa4057694cfb5abc4bL2-R2))

Miscellaneous:

  • Added interleave_weight function to the imports in bitblas/quantization/__init__.py. ([bitblas/quantization/__init__.pyL12-R16](https://github.com/microsoft/BitBLAS/pull/214/files#diff-aeb54b540a85cbc63bdf9e661a713906e58ddb8c69f56d090bd811e2ba9b4b97L12-R16))

The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels.

Update MatmulScheduler class in matmul_tensorcore.py

The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs.

Improve test_general_matmul_tilelang_kernel.py

The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness.
…_tilelang_kernel.py to use centered random values for input tensors
@LeiWang1999
Copy link
Contributor Author

Though TL enables more program flexibility for us to write kernels, but it's hard to implement all dequant kernels within a simple template(as our tir schedule based template)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant