We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the multidevice transformer tests there are extra FillFunctor kernels.
FillFunctor
Previously we generated these kernels:
ampere_bf16_s16816gemm_bf16_256x128_ldg8_f2f_stages_64x3_nn <unnamed>::nvfuser_pointwise_f0_c1_r0_g12(<unnamed>::Tensor<<unnamed>::__bfloat, (int)2, (int)2>, <unnamed>::Tensor<<unnamed>::__bfloat, (int)3, (int)3>, <unnamed>::Tensor<<unnamed>::__bfloat, (int)3, (int)3>) ampere_bf16_s16816gemm_bf16_256x128_ldg8_f2f_stages_64x3_nn ncclDevKernel_AllReduce_Sum_bf16_RING_LL(ncclDevKernelArgsStorage<(unsigned long)4096>) <unnamed>::nvfuser_pointwise_f0_c1_r0_g15(<unnamed>::Tensor<<unnamed>::__bfloat, (int)1, (int)1>, <unnamed>::Tensor<<unnamed>::__bfloat, (int)2, (int)2>, <unnamed>::Tensor<float, (int)2, (int)2>, long long *, long long, long long *, long long, <unnamed>::Tensor<float, (int)2, (int)2>, <unnamed>::Tensor<float, (int)2, (int)2>)
Currently (note the different gemm is intentional and resulted from switching from matmul + bias add to linear):
ampere_bf16_s16816gemm_bf16_128x64_ldg8_relu_f2f_stages_64x4_tn void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<c10::BFloat16>, at::detail::Array<char *, (int)1>>(int, T2, T3) <unnamed>::nvfuser_pointwise_f0_c1_r0_g4(<unnamed>::Tensor<<unnamed>::__bfloat, (int)3, (int)3>, <unnamed>::Tensor<<unnamed>::__bfloat, (int)3, (int)3>) ampere_bf16_s16816gemm_bf16_64x64_sliced1x2_ldg8_f2f_stages_64x5_tn void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<c10::BFloat16>, at::detail::Array<char *, (int)1>>(int, T2, T3) ncclDevKernel_AllReduce_Sum_bf16_RING_LL(ncclDevKernelArgsStorage<(unsigned long)4096>) void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::Array<char *, (int)1>>(int, T2, T3) void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::Array<char *, (int)1>>(int, T2, T3) <unnamed>::nvfuser_pointwise_f0_c1_r0_g5(<unnamed>::Tensor<<unnamed>::__bfloat, (int)1, (int)1>, <unnamed>::Tensor<<unnamed>::__bfloat, (int)2, (int)2>, long long *, long long, long long *, long long, <unnamed>::Tensor<float, (int)3, (int)3>, <unnamed>::Tensor<float, (int)3, (int)3>)
The text was updated successfully, but these errors were encountered:
cowanmeg
No branches or pull requests
In the multidevice transformer tests there are extra
FillFunctor
kernels.Previously we generated these kernels:
Currently (note the different gemm is intentional and resulted from switching from matmul + bias add to linear):
The text was updated successfully, but these errors were encountered: