Kernel launch errors from `cudaGetLastError` are not handled #545

aryan-programmer · 2023-10-06T04:16:50Z

The enqueue_raw_kernel_launch_in_current_context does not check for an error with cudaGetLastError, thus kernel launch errors like cudaErrorInvalidConfiguration go uncaught, and the kernel launch silently fails.

Minimal example:

The following example demonstrates an example kernel, whose launch should fail since the maximum number of threads per block is 1024, and we are trying to launch it with 1500 threads.

#include <device_launch_parameters.h>
#include <stdio.h>
#include <iostream>
#include <cuda/api.hpp>

using namespace std;

__global__ void test_kernel() {
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	if (i == 0) {
		printf("Hello CUDA\n");
	}
}

int main() {
	if (cuda::device::count() == 0) {
		std::cerr << "No CUDA devices on this system" << "\n";
		exit(EXIT_FAILURE);
	}

	cuda::device::current::set(cuda::device::get(0));
	auto device = cuda::device::current::get();

	try {
		cout << "Executing the kernel:" << endl;
		cuda::launch_configuration_t lc = cuda::launch_config_builder()
										 .overall_size(2048)
										 .block_dimensions(1500)
										 .build();
		cuda::launch(test_kernel, lc);
	} catch (std::exception ex) {
		cout << ex.what() << endl;
	}
	cuda::synchronize(device);

	return 0;
}

Current output:

Executing the kernel:

The kernel launch fails silently.

Expected output:

Executing the kernel:
Kernel launch failed: invalid configuration argument

The kernel launch error from cudaGetLastError is handled and converted into an exception that is caught here.

The text was updated successfully, but these errors were encountered:

eyalroz · 2023-10-07T08:24:13Z

Please verify that this works for you now on the development branch.

aryan-programmer · 2023-10-07T09:32:25Z

It catches the error in the release build correctly.

However in the debug build while the launch parameter validation does catch the error properly, it also results in false alarms as follows:

Specifying the block_dimensions to be 32 results in the following error:

Executing the kernel:
specified block X-axis dimension 32 exceeds the maximum supported X dimension of 1024 for device 0

Specifying the block_dimensions to be 1024 results in the following error:

Executing the kernel:
specified block Y-axis dimension 1 exceeds the maximum supported Y dimension of 1024 for device 0

Both of these should have worked and printed "Hello World". Indeed, in release (with the block_dimensions being 32 or 1024) it outputs:

Executing the kernel:
Hello CUDA

eyalroz · 2023-10-07T14:56:29Z

Can you try again?

aryan-programmer · 2023-10-08T07:26:41Z

It works correctly now.

eyalroz self-assigned this Oct 7, 2023

eyalroz added bug resolved-on-development labels Oct 7, 2023

eyalroz mentioned this issue Oct 7, 2023

Block dimension validation is comparing in the wrong direction #550

Closed

eyalroz closed this as completed in ef07b10 Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel launch errors from `cudaGetLastError` are not handled #545

Kernel launch errors from `cudaGetLastError` are not handled #545

aryan-programmer commented Oct 6, 2023

eyalroz commented Oct 7, 2023

aryan-programmer commented Oct 7, 2023

eyalroz commented Oct 7, 2023

aryan-programmer commented Oct 8, 2023

Kernel launch errors from cudaGetLastError are not handled #545

Kernel launch errors from cudaGetLastError are not handled #545

Comments

aryan-programmer commented Oct 6, 2023

Minimal example:

Current output:

Expected output:

eyalroz commented Oct 7, 2023

aryan-programmer commented Oct 7, 2023

eyalroz commented Oct 7, 2023

aryan-programmer commented Oct 8, 2023

Kernel launch errors from `cudaGetLastError` are not handled #545

Kernel launch errors from `cudaGetLastError` are not handled #545