This blog post accompanies my GitHub Universe video presentation on oneAPI (see Embrace the accelerated, cross-architecture era - upstream and downstream on Intel's Universe home page) if you happen to come here first). I am going to continue to add stuff over the next week so if you find the current state incomplete, you might find it improves on its own. Alternatively, feel free to contact me to ask for the content you want to see.
- email: it's on my GitHub home page
- tweet: science_dot
- issue: create a GitHub issue against this repo to ask a question.
- Data Parallel C++ Tutorial
- Parallel Research Kernels
- Stencil Demo
- Intel DPC++ Compiler
- oneAPI GitHub Project
- oneAPI CI Examples
- Jeff's blog about getting oneAPI working on a Tiger Lake laptop
Download DPC++ from GitHub here: https://github.com/intel/llvm/. The most common way to download is likely the following:
git clone https://github.com/intel/llvm.git dpcpp
You do not need to do this, but you are certainly free to compile DPC++ from source on Intel platforms. If you do not want to compile DPC++, you can just install via Linux package managers as described on Installing Intel® oneAPI Toolkits via Linux* Package Managers.
The build for Intel processors is trivial:
python ./buildbot/configure.py
python ./buildbot/compile.py [-jN]
The build of DPC++ for CUDA (PTX back-end) is straightforward. You should use CUDA 10.1, 11.0 or 11.1. I recall that 11.2 is not yet supported. Version 10.0 is not supported but mostly works (see below for additional comments).
python ./buildbot/configure.py [--cuda]
python ./buildbot/compile.py [-jN]
I have tested DPC++ for CUDA on P100, V100 and A100. It is possible to have problems due to various CUDA configuration issues on Linux. If you experience such issues, report them on the DPC++ GitHub project.
I ported DPC++ to ARM in September (PR 2333) but unfortunately, there has been a regression in the build system that I have not yet been able to fix, so please use my branch agx-works for now.
The ARM build is straightforward using the buildbot scripts:
python ./buildbot/configure.py --arm [--cuda]
python ./buildbot/compile.py [-j1]
If you build on an ARM+CUDA platform like Xavier AGX, you should add the --cuda
option. Note that the current AGX distribution of CUDA is version 10.0, which is technically unsupported (10.1 is) and likely causes an issue with memory deallocation in some programs. I am optimistic that the upcoming refresh of the AGX software distribution will address this.
If you are building a Raspberry Pi, you need to disable parallelism (-j1
) because the memory on a Pi is insufficient to do parallel builds of LLVM. If you do not limit build parallelism, your Pi will almost become unresponsive and require power cycling.
TODO
I'll add answers to any questions I receive. If you ask a question in a public forum, I'll cite that, otherwise I will not attribute your question unless you specifically request it.
(c) Copyright Jeff Hammond, 2020. CC BY 4.0 license. See https://creativecommons.org/licenses/by/4.0/ for details.