Skip to content

Commit

Permalink
Merge pull request #2 from CNugteren/development
Browse files Browse the repository at this point in the history
Updated to version 3.0
  • Loading branch information
CNugteren committed Sep 4, 2015
2 parents ece8586 + e7a8c37 commit 793c5b9
Show file tree
Hide file tree
Showing 12 changed files with 9,724 additions and 128 deletions.
25 changes: 25 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

Version 3.0 (2015-09-04):
- Renamed the project from 'Claduc' into 'CLCudaAPI'
- SetArgument now takes both l-value and r-value arguments
- Added first version of a test infrastructure
- Added new methods to the API:
* Platform::NumDevices
* Buffer::Buffer (a constructor with default read-write access)
* Buffer::Buffer (a constructor filled with data from C++ start/end iterators)
* Kernel::Launch (version with default OpenCL workgroup size)

Version 2.0 (2015-07-13):
- Allows device program string to be moved into Program at construction
- Cleaned-up device-information methods
- Added new methods to the API:
* Device::CoreClock,
* Device::ComputeUnits,
* Device::MemorySize,
* Device::MemoryClock,
* Device::MemoryBusWidth
* Program::GetIR
* Kernel::SetArguments

Version 1.0 (2015-07-09):
- Initial version
28 changes: 23 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

# ==================================================================================================
# This file is part of the Claduc project. The project is licensed under Apache Version 2.0. This
# This file is part of the CLCudaAPI project. The project is licensed under Apache Version 2.0. This
# project loosely follows the Google C++ styleguide and uses a tab-size of two spaces and a max-
# width of 100 characters per line.
#
Expand Down Expand Up @@ -30,12 +30,15 @@

# CMake project details
cmake_minimum_required(VERSION 2.8.10)
project("Claduc" CXX)
set(Claduc_VERSION_MAJOR 2)
set(Claduc_VERSION_MINOR 0)
project("CLCudaAPI" CXX)
set(CLCudaAPI_VERSION_MAJOR 3)
set(CLCudaAPI_VERSION_MINOR 0)

# ==================================================================================================

# Enable tests
option(ENABLE_TESTS "Build test-suite" ON)

# Select between OpenCL and CUDA back-end
option(USE_OPENCL "Use OpenCL instead of CUDA" ON)
if(USE_OPENCL)
Expand Down Expand Up @@ -89,7 +92,7 @@ endif()
# ==================================================================================================

# Include directories: C++11 headers and OpenCL/CUDA includes
include_directories(${Claduc_SOURCE_DIR}/include)
include_directories(${CLCudaAPI_SOURCE_DIR}/include)
if(USE_OPENCL)
include_directories(${OPENCL_INCLUDE_DIRS})
else()
Expand Down Expand Up @@ -118,3 +121,18 @@ foreach(SAMPLE ${SAMPLE_PROGRAMS})
endforeach()

# ==================================================================================================

# Optional: Enable inclusion of the test-suite
if (ENABLE_TESTS)
enable_testing()
include_directories(${CLCudaAPI_SOURCE_DIR}/test)
add_executable(unit_tests test/unit_tests.cc)
if(USE_OPENCL)
target_link_libraries(unit_tests ${OPENCL_LIBRARIES})
else()
target_link_libraries(unit_tests cuda nvrtc)
endif()
add_test(unit_tests unit_tests)
endif()

# ==================================================================================================
48 changes: 29 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@

Claduc: A portable high-level API with CUDA or OpenCL back-end
CLCudaAPI: A portable high-level API with CUDA or OpenCL back-end
================

Claduc provides a C++ interface to the OpenCL API and/or CUDA API. This interface is high-level: all the details of setting up an OpenCL platform and device are handled automatically, as well as for example OpenCL and CUDA memory management. A similar high-level API is also provided by Khronos's `cl.hpp`, so why would someone use Claduc instead? The main reason is portability: Claduc provides two header files which both implement the exact same API, but with a different back-end. This allows __porting between OpenCL and CUDA by simply changing the header file!__
CLCudaAPI provides a C++ interface to the OpenCL API and/or CUDA API. This interface is high-level: all the details of setting up an OpenCL platform and device are handled automatically, as well as for example OpenCL and CUDA memory management. A similar high-level API is also provided by Khronos's `cl.hpp`, so why would someone use CLCudaAPI instead? The main reason is portability: CLCudaAPI provides two header files which both implement the exact same API, but with a different back-end. This allows __porting between OpenCL and CUDA by simply changing the header file!__

Claduc is written in C++11 and wraps CUDA and OpenCL objects in smart pointers, thus handling memory management automatically. It uses the CUDA driver API, since this is the closest to the OpenCL API, but it uses the OpenCL terminology, since this is the most generic. It compiles OpenCL and/or CUDA kernels at run-time, possible in CUDA only since release 7.0. Claduc handles the host API only: it still requires two versions of the kernel (although some simple defines could omit this requirement).
CLCudaAPI is written in C++11 and wraps CUDA and OpenCL objects in smart pointers, thus handling memory management automatically. It uses the CUDA driver API, since this is the closest to the OpenCL API, but it uses the OpenCL terminology, since this is the most generic. It compiles OpenCL and/or CUDA kernels at run-time, possible in CUDA only since release 7.0. CLCudaAPI handles the host API only: it still requires two versions of the kernel (although some simple defines could omit this requirement).


What does it look like?
Expand All @@ -21,57 +21,57 @@ To get started, include either of the two headers:
Here is a simple example of setting-up platform 0 and selecting device 2:

```c++
auto platform = Claduc::Platform(0);
auto device = Claduc::Device(platform, 2);
auto platform = CLCudaAPI::Platform(0);
auto device = CLCudaAPI::Device(platform, 2);
```

Next, we'll create a CUDA/OpenCL context and a queue (== CUDA stream) on this device:

```c++
auto context = Claduc::Context(device);
auto queue = Claduc::Queue(context, device);
auto context = CLCudaAPI::Context(device);
auto queue = CLCudaAPI::Queue(context, device);
```

And, once the context and queue are created, we can allocate and upload data to the device:

```c++
auto host_mem = std::vector<float>(size);
auto device_mem = Claduc::Buffer<float>(context, Claduc::BufferAccess::kReadWrite, size);
auto device_mem = CLCudaAPI::Buffer<float>(context, size);
device_mem.WriteBuffer(queue, size, host_mem);
```

Further examples are included in the `samples` folder. To start with Claduc, check out `samples/simple.cc`, which shows how to compile and launch a simple kernel. The full [Claduc API reference](doc/api.md) is also available in the current repository.
Further examples are included in the `samples` folder. To start with CLCudaAPI, check out `samples/simple.cc`, which shows how to compile and launch a simple kernel. The full [CLCudaAPI API reference](doc/api.md) is also available in the current repository.


Why would I use Claduc?
Why would I use CLCudaAPI?
-------------

The main reasons to use Claduc are:
The main reasons to use CLCudaAPI are:

* __Portability__: the CUDA and OpenCL Claduc headers implement the exact same API.
* __Portability__: the CUDA and OpenCL CLCudaAPI headers implement the exact same API.
* __Memory management__: smart pointers allocate and free memory automatically.
* __Error checking__: all CUDA and OpenCL API calls are automatically checked for errors.
* __Abstraction__: Claduc provides a higher-level interface than OpenCL, CUDA, and `cl.hpp`.
* __Abstraction__: CLCudaAPI provides a higher-level interface than OpenCL, CUDA, and `cl.hpp`.
* __Easy to use__: simply ship two OS/hardware-independent header files, no compilation needed.
* __Low overhead__ : all function calls are automatically in-lined by the compiler.
* __Native compiler__: Claduc code can be compiled with a normal C++ compiler, there is no need to use `nvcc`.
* __Native compiler__: CLCudaAPI code can be compiled with a normal C++ compiler, there is no need to use `nvcc`.

Nevertheless, there are also several cases when Claduc is not suitable:
Nevertheless, there are also several cases when CLCudaAPI is not suitable:

* When fine-grained control is desired: Claduc makes abstractions to certain OpenCL/CUDA handles and settings.
* When fine-grained control is desired: CLCudaAPI makes abstractions to certain OpenCL/CUDA handles and settings.
* When unsupported features are desired: only the most common cases are currently implemented. Although this is not a fundamental limitation, it is a practical one. For example, OpenGL interoperability and CUDA constant/texture memory are not supported.
* When run-time compilation is not an option: e.g. when compilation overhead is too high.

What are the pre-requisites?
-------------

The requirements to use the Claduc headers are:
The requirements to use the CLCudaAPI headers are:

* CUDA 7.0 or higher (for run-time compilation)
* OpenCL 1.1 or higher
* A C++11 compiler (e.g. GCC 4.7 or newer)

If you also want to compile the samples using the provided infrastructure, you'll also need:
If you also want to compile the samples and tests using the provided infrastructure, you'll also need:

* CMake 2.8.10 or higher

Expand All @@ -91,9 +91,19 @@ make
Replace `-DUSE_OPENCL=ON` with `-DUSE_OPENCL=OFF` to use CUDA instead of OpenCL as a back-end. After compilation, the `build` folder will contain a binary for each of the sample programs included in the `samples` subfolder.


How do I compile the included test-suite with CMake?
-------------

Compile the examples (see above) will also compile the tests (unless `-DENABLE_TESTS=OFF` is set). The tests will either use the OpenCL or CUDA back-end, similar to the samples. After compilation, the tests can be run using CTest or as follows:

```bash
./unit_tests
```


FAQ
-------------

> Q: __After I include the Claduc CUDA header, the linker finds an undefined reference to `nvrtcGetErrorString'. What should I do?__
> Q: __After I include the CLCudaAPI CUDA header, the linker finds an undefined reference to `nvrtcGetErrorString'. What should I do?__
>
> A: You need to link against the NVIDIA Run-Time Compilation Library (NVRTC). For example, pass `-lnvrtc -L/opt/cuda/lib64` to the compiler.
2 changes: 1 addition & 1 deletion cmake/Modules/FindOpenCL.cmake
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

# ==================================================================================================
# This file is part of the Claduc project. The project is licensed under Apache Version 2.0. This
# This file is part of the CLCudaAPI project. The project is licensed under Apache Version 2.0. This
# project loosely follows the Google C++ styleguide and uses a tab-size of two spaces and a max-
# width of 100 characters per line.
#
Expand Down
38 changes: 23 additions & 15 deletions doc/api.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Claduc: API reference
CLCudaAPI: API reference
================

This file describes the high-level API for both the CUDA and OpenCL back-end of the Claduc headers. On top of the described API, each class has a constructor which takes the regular OpenCL or CUDA data-type and transforms it into a Claduc class. Furthermore, each class also implements a `()` operator which returns the regular OpenCL or CUDA data-type.
This file describes the high-level API for both the CUDA and OpenCL back-end of the CLCudaAPI headers. On top of the described API, each class has a constructor which takes the regular OpenCL or CUDA data-type and transforms it into a CLCudaAPI class. Furthermore, each class also implements a `()` operator which returns the regular OpenCL or CUDA data-type.


Claduc::Event
CLCudaAPI::Event
-------------

Constructor(s):
Expand All @@ -18,7 +18,7 @@ Public method(s):
Retrieves the elapsed time in milliseconds of the last recorded event (e.g. a device kernel). This method first makes sure that the last event is finished before computing the elapsed time.


Claduc::Platform
CLCudaAPI::Platform
-------------

Constructor(s):
Expand All @@ -27,7 +27,7 @@ Constructor(s):
When using the OpenCL back-end, this initializes a new OpenCL platform (e.g. AMD SDK, Intel SDK, NVIDIA SDK) specified by the integer `platform_id`. When using the CUDA back-end, this initializes the CUDA driver API. The `platform_id` argument is ignored: there is only one platform.


Claduc::Device
CLCudaAPI::Device
-------------

Constructor(s):
Expand Down Expand Up @@ -84,16 +84,16 @@ Given a requested amount of local on-chip scratchpad memory, this method returns
Given a requested OpenCL work-group or CUDA thread-block configuration `local`, this method returns whether or not this is a valid configuration for this particular device.


Claduc::Context
CLCudaAPI::Context
-------------

Constructor(s):

* `Context(const Device &device)`:
Initializes a new context on a given device. On top of this context, Claduc can create new programs, queues and buffers.
Initializes a new context on a given device. On top of this context, CLCudaAPI can create new programs, queues and buffers.


Claduc::Program
CLCudaAPI::Program
-------------

Constants(s):
Expand All @@ -117,7 +117,7 @@ Retrieves all compiler warnings and errors generated by the build process.
* `std::string GetIR() const`:
Retrieves the intermediate representation (IR) of the compiled program. When using the CUDA back-end, this returns the PTX-code. For the OpenCL back-end, this returns either an IR (e.g. PTX) or a binary. This is different per OpenCL implementation.

Claduc::Queue
CLCudaAPI::Queue
-------------

Constructor(s):
Expand All @@ -137,7 +137,7 @@ Retrieves the CUDA/OpenCL context associated with this queue.
Retrieves the CUDA/OpenCL device associated with this queue.


template \<typename T\> Claduc::BufferHost
template \<typename T\> CLCudaAPI::BufferHost
-------------

Constructor(s):
Expand All @@ -154,7 +154,7 @@ Retrieves the allocated size in bytes.
Adds some compatibility with `std::vector` by implementing the `size`, `begin`, `end`, `operator[]`, and `data` methods.


template \<typename T\> Claduc::Buffer
template \<typename T\> CLCudaAPI::Buffer
-------------

Constants(s):
Expand All @@ -167,6 +167,12 @@ Constructor(s):
* `Buffer(const Context &context, const BufferAccess access, const size_t size)`:
Initializes a new linear 1D memory buffer on the device of type T. This buffer is allocated with a fixed number of elements given by `size`. Note that the buffer's elements are not initialized. The buffer can be read-only, write-only, or read-write, as specified by the `access` argument.

* `Buffer(const Context &context, const size_t size)`:
As above, but now defaults to read-write access.

* `template <typename Iterator> Buffer(const Context &context, const Queue &queue, Iterator start, Iterator end)`:
Creates a new buffer based on data in a linear C++ container (such as `std::vector`). The size is determined by the difference between the end and start iterators. This method both creates a new buffer and writes data to it. It synchronises the queue before returning.

Public method(s):

* `void ReadAsync(const Queue &queue, const size_t size, T* host)` and
Expand Down Expand Up @@ -199,7 +205,7 @@ As above, but now completes the operation before returning.
Retrieves the allocated size in bytes.


Claduc::Kernel
CLCudaAPI::Kernel
-------------

Constructor(s):
Expand All @@ -209,14 +215,16 @@ Retrieves a new kernel from a compiled program. The kernel name is given as the

Public method(s):

* `template <typename T> void SetArgument(const size_t index, T &value)`:
Method to set a kernel argument. The argument itself (`value`) has to be a non-const l-value, since its address it passed to the OpenCL/CUDA back-end. The argument `index` specifies the position in the list of kernel arguments. The argument `value` can also be a `Claduc::Buffer`.
* `template <typename T> void SetArgument(const size_t index, const T &value)`:
Method to set a kernel argument (l-value or r-value). The argument `index` specifies the position in the list of kernel arguments. The argument `value` can also be a `CLCudaAPI::Buffer`.

* `template <typename... Args> void SetArguments(Args&... args)`: As above, but now sets all arguments in one go, starting at index 0. This overwrites any previous arguments (if any). The parameter pack `args` takes any number of arguments of different types, including `Claduc::Buffer`.
* `template <typename... Args> void SetArguments(Args&... args)`: As above, but now sets all arguments in one go, starting at index 0. This overwrites any previous arguments (if any). The parameter pack `args` takes any number of arguments of different types, including `CLCudaAPI::Buffer`.

* `size_t LocalMemUsage(const Device &device) const`:
Retrieves the amount of on-chip scratchpad memory (local memory in OpenCL, shared memory in CUDA) required by this specific kernel.

* `Launch(const Queue &queue, const std::vector<size_t> &global, const std::vector<size_t> &local, Event &event)`:
Launches a kernel onto the specified queue. This kernel launch is a-synchronous: this method can return before the device kernel is completed. The total number of threads launched is equal to the `global` vector; the number of threads per OpenCL work-group or CUDA thread-block is given by the `local` vector. The elapsed time is recorded into the `event` argument.

* `Launch(const Queue &queue, const std::vector<size_t> &global, Event &event)`: As above, but now the local size is determined automatically (OpenCL only).

Loading

0 comments on commit 793c5b9

Please sign in to comment.