Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable serial grid reduction for split-K in matmul scheduler #1510

Merged
merged 41 commits into from
Jan 23, 2024

Commits on Dec 8, 2023

  1. Configuration menu
    Copy the full SHA
    96225e8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f9d2d01 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6454b06 View commit details
    Browse the repository at this point in the history
  4. Disable previous changes to indexing pass.

    Will revisit once sync pass is done, when we have a TensorIndex
    jacobhinkle committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    1142e44 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1368ba8 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    b97db45 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ac2da9d View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    ebef797 View commit details
    Browse the repository at this point in the history
  9. Add test

    jacobhinkle committed Dec 8, 2023
    Configuration menu
    Copy the full SHA
    dcd8606 View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2023

  1. Fix sync insertion in lowering pass.

    Still missing allocation/indexing of work buffer
    jacobhinkle committed Dec 11, 2023
    Configuration menu
    Copy the full SHA
    e196fa7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8be320a View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2023

  1. Allocate global work buffer. Index is zero for now

    I need to replay leaf transforms, then get index.
    jacobhinkle committed Dec 12, 2023
    Configuration menu
    Copy the full SHA
    41b125f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e465d96 View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2023

  1. Configuration menu
    Copy the full SHA
    34d623c View commit details
    Browse the repository at this point in the history
  2. Use fullSelfReplay and getGlobalConsumerStridedIndices

    Codegen is now like
    ```c++
      // Allocate global tensor T5
      reduction::serialReductionStep(
        T3[0LL],
        T2[(i14 + i18)],
        0.000000000e+00f,
        T5[((((((((((((nvfuser_index_t)blockIdx.x) * 8LL) + ((nvfuser_index_t)blockIdx.y)) * 4LL) + i13) * 8LL) + (i18 + nvfuser_zero)) * 4LL) + ((nvfuser_index_t)threadIdx.y)) * 32LL) + ((nvfuser_index_t)threadIdx.x))],
        [](float &a, float b) { a = a + b; },
        index_utils::maskedOffset<false, false, true>(blockIdx, gridDim) == 0,
        index_utils::maskedOffset<false, false, true>(blockIdx, gridDim) == index_utils::maskedSize<false, false, true>(gridDim) - 1,
        true,
        true);
    ```
    This looks OK, although it will get a little better with hoisting. This
    compiles, but I get an error in `runFusion`:
    ```
    C++ exception with description "Expected T5_g[ iblockIdx.x59{( ceilDiv(( ceilDiv(( ceilDiv(( ceilDiv(( ceilDiv(262144, 32) ), 4) ), 8) ), 4) ), 8) )}, iblockIdx.y60{8}, ithreadIdx.y54{4}, ithreadIdx.x52{32}, iS58{4}, iS56{8}, rblockIdx.z49{5} ] to be bound to a tensor of rank 1, but got a tensor of rank 6
    Exception raised from validateValWithConcreteValue at /opt/pytorch/nvfuser/csrc/expr_evaluator.cpp:38 (most recent call first):
    ```
    This is happening when binding inputs I believe.
    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    910ff09 View commit details
    Browse the repository at this point in the history
  3. Infer shape using allocation domain instead of root

    Fixes execution error. Test passes!
    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    507cf47 View commit details
    Browse the repository at this point in the history
  4. Update comments

    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    e44ef7e View commit details
    Browse the repository at this point in the history
  5. Hoist index scalar.

    Generated kernel now looks like
    ```c++
      // Allocate global tensor T4
      grid_sync::blockSerializeWait<false, false, true>(&T4[index_utils::maskedOffset<true, true, false>(blockIdx, gridDim)]);
      #pragma unroll
      for(nvfuser_index_t i13 = 0; i13 < 4LL; ++i13) {
        nvfuser_index_t i14;
        i14 = 8LL * i13;
        nvfuser_index_t i15;
        i15 = 2048LL * i13;
        nvfuser_index_t i16;
        i16 = i4 + i15;
        nvfuser_index_t i17;
        i17 = -i15;
        #pragma unroll
        for(nvfuser_index_t i18 = 0; i18 < 8LL; ++i18) {
          nvfuser_index_t i19;
          i19 = 256LL * (i18 + nvfuser_zero);
          nvfuser_index_t i20;
          i20 = i16 + i19;
          float T3[1LL];
          T3[0LL] = 0.000000000e+00f;
          // Allocate global tensor T5
          reduction::serialReductionStep(
            T3[0LL],
            T2[(i14 + i18)],
            0.000000000e+00f,
            T5[i20],
            [](float &a, float b) { a = a + b; },
            index_utils::maskedOffset<false, false, true>(blockIdx, gridDim) == 0,
            index_utils::maskedOffset<false, false, true>(blockIdx, gridDim) == index_utils::maskedSize<false, false, true>(gridDim) - 1,
            true,
            true);
          if ((b6 && (i5 < (i17 - i19)))) {
            T1[i20]
               = T3[0LL];
          }
        }
      }
      NVFUSER_UPDATE_MAGIC_ZERO;
      grid_sync::blockSerializeRelease<false, false, true>(&T4[index_utils::maskedOffset<true, true, false>(blockIdx, gridDim)]);
    ```
    Note that the index `i20` matches the output `T1`. This is what we need
    to reclaim `T1` in a later PR; it will still be a challenge in that work
    to exact map between `T5` and `T3` in order to get `T1` and `T5` exact
    mapped...
    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    8a3134e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    327573c View commit details
    Browse the repository at this point in the history
  7. Clean up comments.

    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    d27a675 View commit details
    Browse the repository at this point in the history
  8. Update NVFuserTest.Pipeline_CUDA

    Also sort expected output by line to give clearer error messages.
    jacobhinkle committed Dec 13, 2023
    Configuration menu
    Copy the full SHA
    46c70b6 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    6d2089d View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2023

  1. Configuration menu
    Copy the full SHA
    751f326 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2024

  1. Configuration menu
    Copy the full SHA
    f4ad5ff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1149b43 View commit details
    Browse the repository at this point in the history

Commits on Jan 8, 2024

  1. Configuration menu
    Copy the full SHA
    94e55b8 View commit details
    Browse the repository at this point in the history

Commits on Jan 10, 2024

  1. Configuration menu
    Copy the full SHA
    f2d7461 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2024

  1. Fix compile error

    jacobhinkle committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    a864184 View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2024

  1. Configuration menu
    Copy the full SHA
    c236810 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2024

  1. Configuration menu
    Copy the full SHA
    819a6e0 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. Configuration menu
    Copy the full SHA
    cf42527 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    449a296 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Configuration menu
    Copy the full SHA
    27e449a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c6ddebf View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    2c3278b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    4e40d68 View commit details
    Browse the repository at this point in the history
  5. Restore split-k benchmarks

    These were disabled in #1545 because of slow compilation with gridReduce
    jacobhinkle committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    7bfa709 View commit details
    Browse the repository at this point in the history
  6. Fix after rebase

    jacobhinkle committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    fc07a9a View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. Configuration menu
    Copy the full SHA
    e9111e6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4f93290 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    915cd97 View commit details
    Browse the repository at this point in the history