-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple OpenCL platforms #1345
Conversation
66bee46
to
e406f4c
Compare
e406f4c
to
a6bc7b9
Compare
@tqchen @tmoreau89 @comaniac Can you review this PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanism of using CPU/GPU to test the HLS C kernel when SDAccel environment isn't available seems reasonable to me. Only one very miner comment listed.
On the other hand, although I also reviewed the runtime part and looked good to me as well, I have very few sense of a good way to organize those devices. Please refer to others in this part.
src/pass/verify_memory.cc
Outdated
@@ -139,7 +139,7 @@ class MemoryAccessVerifier final : protected IRVisitor { | |||
|
|||
/// Check if a given DLDeviceType/TVMDeviceExtType value denotes GPU device. | |||
static bool IsGPUDevice(int dev_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give a better name to this function since now it checks not only GPU but FPGA devices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment. I've introduced IsFPGADevice() instead of modifying IsGPUDevice(). Now, we can check that the memory access is not to host with !(IsGPUDevice() || IsFPGADevice).
src/runtime/opencl/opencl_module.cc
Outdated
}); | ||
|
||
TVM_REGISTER_GLOBAL("module.loadfile_awsxclbin") | ||
.set_body([](TVMArgs args, TVMRetValue* rv) { | ||
*rv = OpenCLModuleLoadFile(args[0], args[1]); | ||
*rv = OpenCLModuleLoadFile(args[0], args[1], {"accelerator"}, "Xilinx"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be using "Xilinx" instead of something more specific, like aws_f1 or a specific FPGA device number? I suspect that there might be more than a single Xilinx backend as we start to include more support towards FPGAs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Xilinx" is the name of the OpenCL platform here. It's common between AWS F1 and other Xilinx FPGAs which support SDAccel.
One question: will this flow support all three of (1) hw emulation, (2), sw emulation, (3), actual hardware execution on the FPGA. Another question, is how the different hardware FPGA backends should be handled by the runtime. Right now are we aiming to support only VUP9+ based F1 instances? Or are you also planning on supporting other development boards? Overall, the PR looks acceptable, although I'm not an expert on the TVM runtime. Maybe @tqchen can give a final word on the changes. |
This flow supports all of the three cases. In the cases of (1) and (2), the Xilinx OpenCL platform tries to read emconfig.json and emulates devices written in the file. We can use, e.g., non-F1 instances for development like that. If emconfig.json doesn't exist or the Xilinx toolchain is not set up, TVM will falls back to other OpenCL platforms.
The current implementation should work for all Xilinx FPGAs which supports SDAccel. I've not tested other FPGA environments than AWS F1 yet, though. |
@kazum thank you for your answers. One more question, in the case of having a different SDAccel FPGA targets (i.e. different part numbers), what variable would we need to change to set the device number? |
@tmoreau89 Currently, the runtime always uses the first device and it is hard-coded in: |
src/runtime/opencl/opencl_module.cc
Outdated
workspace_ = cl::OpenCLWorkspace::Global(); | ||
workspace_->Init(); | ||
workspace_->Init(device_types, platform_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current choice of platform is mutually exclusive, so that once we load a sdaccel file, it is no longer possible to load an OpenCL GPU module due to the single global singleton being used in here.
Sorry for delayed review on this, see my comment. The current way of supporting SDAccel is not desirable, because when both SDAccel and GPU are present, the global singleton can only be initialized to one platform. If we indeed want to support sdaccel as a separate device as opencl -- which makes sense because the opencl for hardware is quite different. Then we might want to create a separate singleton for the SDAccel while reusing the current OpenCL runtime code. This being said, we may not need the fake compile feature at the moment for the same reason that sdaccel is very different from OpenCL. Seems that we just need a way to tell if sdaccel is available. |
@tqchen Thanks for your comments. I'm okay with separating SDAccel implementation from the OpenCL code because the current singleton architecture cannot support multiple FPGA devices in either way. Then, I think of adding a base class for OpenCL devices, and re-implementing GPU and SDAccel supports upon it. I think we don't need so big code changes for that. Is it fine with you? It's also okay to me that the fake compile feature is not necessary now. I'll try to find a better way to test SDAccel codes for CI without SDAccel environment. |
@kazum It sounds good. About the multiple FPGA support, I think it is possible to support multiple FPGAs in current runtime, as long as these devices are all added. and you can use sdaccel(0) and sdaccel(1) to indicate devices |
@tqchen Ah, I see. I'll give it a try, thanks! |
a2c1051
to
7296f8e
Compare
@tqchen I've pushed commits to separate SDAccel runtime and remove fake compile feature. Can you check them out? About multiple FPGA support, I think of sending another patches later after we finish this PR. The CI system fails to create test binaries now, but it looks like it refers to the old version of libtvm.so. The same test can be passed successfully on my environment. |
So sorry, it was caused my bug, and I've fixed the problem. |
src/codegen/opt/build_opencl_off.cc
Outdated
@@ -17,5 +17,14 @@ Module OpenCLModuleCreate( | |||
return codegen::DeviceSourceModuleCreate(data, fmt, fmap, "opencl"); | |||
} | |||
|
|||
Module SDAccelModuleCreate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make SDAccel into a subfolder of runtime/opencl, add an optional configuration flag in cmake(USE_SDACCEL). This is mainly to minimize the runtime binary size
const std::shared_ptr<OpenCLWorkspace>& OpenCLWorkspace::Global() { | ||
static std::shared_ptr<OpenCLWorkspace> inst = std::make_shared<OpenCLWorkspace>(); | ||
template <OpenCLPlatform T> | ||
const std::shared_ptr<OpenCLWorkspace<T>>& OpenCLWorkspace<T>::Global() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementation of template function should go to the header file
} | ||
|
||
TVM_REGISTER_GLOBAL("device_api.opencl") | ||
.set_body([](TVMArgs args, TVMRetValue* rv) { | ||
DeviceAPI* ptr = OpenCLWorkspace::Global().get(); | ||
DeviceAPI* ptr = OpenCLWorkspace<OpenCLPlatform::kGPU>::Global().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move sdaccel into a separate file
@kazum I have made some followup comments. One general feeling I have in the current change is that the template has been abused a bit. Actually, we could logically divide things into common base, no template, common logics:
OpenCLWorkspace,
OpenCLModuleNode,
OpenCLThreadEntry
SDAccelWorkspace : OpenCLWorkspace
SDAccelModuleNode: OpenCLModuleNode The key problem we might face is that some cases we might need to get back to the Global singleton reference. This can likely be resolved by adding a virtual function(GetGlobalWorkspace) and do subclassing The second comment is that we want to optionally build sdaccel as a module, without build both when opencl is enabled. This is to minimize the tvm runtime size which could be important for mobile devices. |
@tqchen Thanks for your comments! I've addressed them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some final comments
src/runtime/opencl/opencl_common.h
Outdated
const char* s = data_.c_str(); | ||
size_t len = data_.length(); | ||
cl_int err; | ||
cl_program program = clCreateProgramWithSource(workspace_->context, 1, &s, &len, &err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is helpful to support both CreateProgramWithSource and CreateProgramWithBinary by checking the format, this way CreateProgram itself do not have to be a diverged function
src/runtime/opencl/opencl_common.h
Outdated
std::string source) | ||
: data_(data), fmt_(fmt), fmap_(fmap), source_(source) {} | ||
// destructor | ||
~OpenCLModuleNode() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider only keep declaration in header and move the implementations to cc file
e224cdc
to
a1e91c3
Compare
@tqchen Addressed, thanks! |
src/runtime/opencl/opencl_module.cc
Outdated
} | ||
} | ||
|
||
std::shared_ptr<cl::OpenCLWorkspace> OpenCLModuleNode::GetGlobalWorkspace() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to
const std::shared_ptr<OpenCLWorkspace>&
So when we refer to it, we don't have to copy
@tqchen Addressed, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last change and it can be merged
src/runtime/opencl/opencl_common.h
Outdated
void Init(); | ||
void Init(const std::vector<std::string>& device_types, const std::string& platform_name = ""); | ||
virtual void Init() { | ||
Init({"gpu", "cpu"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only initialize gpu, to be consistent with existing seeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This initializes only GPU basically, and falls back to CPU only when GPU is not available. The behavior is consistent with the original one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is that the current opencl schedules assumes GPU, so they may not work when falling back to CPUs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'm fine with removing the fallback feature in this PR.
thanks @kazum this is merged! |
This PR allows having both SDAccel and other OpenCL platforms.
When the target device is 'sdaccel', the runtime tries to use the Xilinx toolchain and find FPGA devices. If it fails to find SDAccel environment, it falls back to other OpenCL platforms so that unittests can test sdaccel backend.
When the target device is 'opencl', the runtime tries to find GPU and CPU devices. The behavior is same as before.