Add support for multiple OpenCL platforms #1345

kazum · 2018-06-27T18:11:26Z

This PR allows having both SDAccel and other OpenCL platforms.

When the target device is 'sdaccel', the runtime tries to use the Xilinx toolchain and find FPGA devices. If it fails to find SDAccel environment, it falls back to other OpenCL platforms so that unittests can test sdaccel backend.

When the target device is 'opencl', the runtime tries to find GPU and CPU devices. The behavior is same as before.

kazum · 2018-06-27T20:03:46Z

@tqchen @tmoreau89 @comaniac Can you review this PR? Thanks.

comaniac

The mechanism of using CPU/GPU to test the HLS C kernel when SDAccel environment isn't available seems reasonable to me. Only one very miner comment listed.

On the other hand, although I also reviewed the runtime part and looked good to me as well, I have very few sense of a good way to organize those devices. Please refer to others in this part.

comaniac · 2018-06-27T20:57:53Z

src/pass/verify_memory.cc

@@ -139,7 +139,7 @@ class MemoryAccessVerifier final : protected IRVisitor {

  /// Check if a given DLDeviceType/TVMDeviceExtType value denotes GPU device.
  static bool IsGPUDevice(int dev_type) {


Should we give a better name to this function since now it checks not only GPU but FPGA devices?

Thanks for your comment. I've introduced IsFPGADevice() instead of modifying IsGPUDevice(). Now, we can check that the memory access is not to host with !(IsGPUDevice() || IsFPGADevice).

tmoreau89 · 2018-06-29T20:55:35Z

src/runtime/opencl/opencl_module.cc

  });

 TVM_REGISTER_GLOBAL("module.loadfile_awsxclbin")
 .set_body([](TVMArgs args, TVMRetValue* rv) {
-    *rv = OpenCLModuleLoadFile(args[0], args[1]);
+    *rv = OpenCLModuleLoadFile(args[0], args[1], {"accelerator"}, "Xilinx");


should we be using "Xilinx" instead of something more specific, like aws_f1 or a specific FPGA device number? I suspect that there might be more than a single Xilinx backend as we start to include more support towards FPGAs.

"Xilinx" is the name of the OpenCL platform here. It's common between AWS F1 and other Xilinx FPGAs which support SDAccel.

tmoreau89 · 2018-06-29T21:09:20Z

One question: will this flow support all three of (1) hw emulation, (2), sw emulation, (3), actual hardware execution on the FPGA.
In this case will the runtime always check for FPGA device, or only for case 3?
I can imagine launching an regular CPU instance with an FPGA development AMI, instead of an F1 instance, just to test OpenCL synthesis in software emulation/hardware emulation.

Another question, is how the different hardware FPGA backends should be handled by the runtime. Right now are we aiming to support only VUP9+ based F1 instances? Or are you also planning on supporting other development boards?

Overall, the PR looks acceptable, although I'm not an expert on the TVM runtime. Maybe @tqchen can give a final word on the changes.

kazum · 2018-06-29T21:34:25Z

One question: will this flow support all three of (1) hw emulation, (2), sw emulation, (3), actual hardware execution on the FPGA.
In this case will the runtime always check for FPGA device, or only for case 3?
I can imagine launching an regular CPU instance with an FPGA development AMI, instead of an F1 instance, just to test OpenCL synthesis in software emulation/hardware emulation.

This flow supports all of the three cases. In the cases of (1) and (2), the Xilinx OpenCL platform tries to read emconfig.json and emulates devices written in the file. We can use, e.g., non-F1 instances for development like that. If emconfig.json doesn't exist or the Xilinx toolchain is not set up, TVM will falls back to other OpenCL platforms.

Another question, is how the different hardware FPGA backends should be handled by the runtime. Right now are we aiming to support only VUP9+ based F1 instances? Or are you also planning on supporting other development boards?

The current implementation should work for all Xilinx FPGAs which supports SDAccel. I've not tested other FPGA environments than AWS F1 yet, though.

tmoreau89 · 2018-06-30T17:46:01Z

@kazum thank you for your answers. One more question, in the case of having a different SDAccel FPGA targets (i.e. different part numbers), what variable would we need to change to set the device number?

kazum · 2018-06-30T22:22:51Z

@tmoreau89 Currently, the runtime always uses the first device and it is hard-coded in:
https://github.com/dmlc/tvm/blob/a06f520/src/runtime/opencl/opencl_module.cc#L106
Supporting multiple FPGA devices is future work and I'd be glad to send the PR. :)

tmoreau89 · 2018-07-02T22:20:06Z

@kazum great, I think it's good that we keep in mind that by using the same SDAccel flow, we may be targeting multiple FPGA platforms.
I don't have anything to add, @tqchen if the changes to the run time look good, can you approve this PR?

tqchen · 2018-07-03T02:45:31Z

src/runtime/opencl/opencl_module.cc

    workspace_ = cl::OpenCLWorkspace::Global();
-    workspace_->Init();
+    workspace_->Init(device_types, platform_name);


The current choice of platform is mutually exclusive, so that once we load a sdaccel file, it is no longer possible to load an OpenCL GPU module due to the single global singleton being used in here.

tqchen · 2018-07-03T02:54:39Z

Sorry for delayed review on this, see my comment.

The current way of supporting SDAccel is not desirable, because when both SDAccel and GPU are present, the global singleton can only be initialized to one platform.

If we indeed want to support sdaccel as a separate device as opencl -- which makes sense because the opencl for hardware is quite different. Then we might want to create a separate singleton for the SDAccel while reusing the current OpenCL runtime code.

This being said, we may not need the fake compile feature at the moment for the same reason that sdaccel is very different from OpenCL.

Seems that we just need a way to tell if sdaccel is available.

kazum · 2018-07-03T03:43:25Z

@tqchen Thanks for your comments. I'm okay with separating SDAccel implementation from the OpenCL code because the current singleton architecture cannot support multiple FPGA devices in either way.

Then, I think of adding a base class for OpenCL devices, and re-implementing GPU and SDAccel supports upon it. I think we don't need so big code changes for that. Is it fine with you?

It's also okay to me that the fake compile feature is not necessary now. I'll try to find a better way to test SDAccel codes for CI without SDAccel environment.

tqchen · 2018-07-03T03:47:19Z

@kazum It sounds good.

About the multiple FPGA support, I think it is possible to support multiple FPGAs in current runtime, as long as these devices are all added. and you can use sdaccel(0) and sdaccel(1) to indicate devices

kazum · 2018-07-03T03:52:34Z

@tqchen Ah, I see. I'll give it a try, thanks!

kazum · 2018-07-05T20:31:08Z

@tqchen I've pushed commits to separate SDAccel runtime and remove fake compile feature. Can you check them out? About multiple FPGA support, I think of sending another patches later after we finish this PR.

The CI system fails to create test binaries now, but it looks like it refers to the old version of libtvm.so. The same test can be passed successfully on my environment.

kazum · 2018-07-05T22:43:17Z

The CI system fails to create test binaries now, but it looks like it refers to the old version of libtvm.so.

So sorry, it was caused my bug, and I've fixed the problem.

tqchen · 2018-07-08T18:17:34Z

src/codegen/opt/build_opencl_off.cc

@@ -17,5 +17,14 @@ Module OpenCLModuleCreate(
  return codegen::DeviceSourceModuleCreate(data, fmt, fmap, "opencl");
 }

+Module SDAccelModuleCreate(


make SDAccel into a subfolder of runtime/opencl, add an optional configuration flag in cmake(USE_SDACCEL). This is mainly to minimize the runtime binary size

tqchen · 2018-07-08T18:18:32Z

src/runtime/opencl/opencl_device_api.cc

-const std::shared_ptr<OpenCLWorkspace>& OpenCLWorkspace::Global() {
-  static std::shared_ptr<OpenCLWorkspace> inst = std::make_shared<OpenCLWorkspace>();
+template <OpenCLPlatform T>
+const std::shared_ptr<OpenCLWorkspace<T>>& OpenCLWorkspace<T>::Global() {


implementation of template function should go to the header file

tqchen · 2018-07-08T18:19:13Z

src/runtime/opencl/opencl_device_api.cc

 }

 TVM_REGISTER_GLOBAL("device_api.opencl")
 .set_body([](TVMArgs args, TVMRetValue* rv) {
-    DeviceAPI* ptr = OpenCLWorkspace::Global().get();
+    DeviceAPI* ptr = OpenCLWorkspace<OpenCLPlatform::kGPU>::Global().get();


move sdaccel into a separate file

tqchen · 2018-07-08T18:25:43Z

@kazum I have made some followup comments. One general feeling I have in the current change is that the template has been abused a bit.

Actually, we could logically divide things into

common base, no template, common logics:

OpenCLWorkspace,
OpenCLModuleNode,
OpenCLThreadEntry


SDAccelWorkspace : OpenCLWorkspace
SDAccelModuleNode: OpenCLModuleNode

The key problem we might face is that some cases we might need to get back to the Global singleton reference. This can likely be resolved by adding a virtual function(GetGlobalWorkspace) and do subclassing

The second comment is that we want to optionally build sdaccel as a module, without build both when opencl is enabled. This is to minimize the tvm runtime size which could be important for mobile devices.

kazum · 2018-07-08T21:41:13Z

@tqchen Thanks for your comments! I've addressed them.

tqchen

some final comments

tqchen · 2018-07-08T21:45:47Z

src/runtime/opencl/opencl_common.h

+    const char* s = data_.c_str();
+    size_t len = data_.length();
+    cl_int err;
+    cl_program program = clCreateProgramWithSource(workspace_->context, 1, &s, &len, &err);


It is helpful to support both CreateProgramWithSource and CreateProgramWithBinary by checking the format, this way CreateProgram itself do not have to be a diverged function

tqchen · 2018-07-08T21:46:33Z

src/runtime/opencl/opencl_common.h

+                            std::string source)
+      : data_(data), fmt_(fmt), fmap_(fmap), source_(source) {}
+  // destructor
+  ~OpenCLModuleNode() {


consider only keep declaration in header and move the implementations to cc file

kazum · 2018-07-08T23:30:52Z

@tqchen Addressed, thanks!

tqchen · 2018-07-09T01:59:02Z

src/runtime/opencl/opencl_module.cc

+  }
+}
+
+std::shared_ptr<cl::OpenCLWorkspace> OpenCLModuleNode::GetGlobalWorkspace() {


change to

const std::shared_ptr<OpenCLWorkspace>&

So when we refer to it, we don't have to copy

kazum · 2018-07-09T03:25:21Z

@tqchen Addressed, thanks.

tqchen

one last change and it can be merged

tqchen · 2018-07-09T04:47:29Z

src/runtime/opencl/opencl_common.h

-  void Init();
+  void Init(const std::vector<std::string>& device_types, const std::string& platform_name = "");
+  virtual void Init() {
+    Init({"gpu", "cpu"});


Only initialize gpu, to be consistent with existing seeting

This initializes only GPU basically, and falls back to CPU only when GPU is not available. The behavior is consistent with the original one.

The main problem is that the current opencl schedules assumes GPU, so they may not work when falling back to CPUs

Okay, I'm fine with removing the fallback feature in this PR.

tqchen · 2018-07-09T16:41:57Z

thanks @kazum this is merged!

kazum force-pushed the opencl-multi-platforms branch 3 times, most recently from 66bee46 to e406f4c Compare June 27, 2018 19:21

Add support for multiple OpenCL platforms

a6bc7b9

kazum force-pushed the opencl-multi-platforms branch from e406f4c to a6bc7b9 Compare June 27, 2018 19:33

comaniac reviewed Jun 27, 2018

View reviewed changes

Introduce IsFPGADevice()

a06f520

tmoreau89 reviewed Jun 29, 2018

View reviewed changes

tqchen requested changes Jul 3, 2018

View reviewed changes

tqchen added status: review in progress status: need update need update based on feedbacks labels Jul 3, 2018

kazum added 2 commits July 6, 2018 04:42

Separate SDAccel implementation from the GPU OpenCL runtime

8e833ec

Remove fake compile feature

7296f8e

kazum force-pushed the opencl-multi-platforms branch from a2c1051 to 7296f8e Compare July 5, 2018 19:42

Added SDAccelModuleCreate to build_opencl_off.cc

ef98a4e

tqchen requested changes Jul 8, 2018

View reviewed changes

use inheritance instead of template

7f0e51d

make sdaccel runtime optional

0b7ddcf

tqchen requested changes Jul 8, 2018

View reviewed changes

remove CreateProgram and keep only declaration in header

a1e91c3

kazum force-pushed the opencl-multi-platforms branch from e224cdc to a1e91c3 Compare July 8, 2018 22:48

tqchen requested changes Jul 9, 2018

View reviewed changes

make GetGlobalWorkspace return reference

0f7b1ca

tqchen requested changes Jul 9, 2018

View reviewed changes

don't fallback to cpu when gpu is not available

bd4d110

tqchen approved these changes Jul 9, 2018

View reviewed changes

tqchen merged commit fa2e428 into apache:master Jul 9, 2018

tqchen added status: accepted and removed status: need update need update based on feedbacks status: review in progress labels Jul 9, 2018

tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018

Add support for multiple OpenCL platforms (apache#1345)

cb68c82

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

Add support for multiple OpenCL platforms (apache#1345)

19ba466

kazum deleted the opencl-multi-platforms branch August 23, 2018 02:28

		@@ -139,7 +139,7 @@ class MemoryAccessVerifier final : protected IRVisitor {

		/// Check if a given DLDeviceType/TVMDeviceExtType value denotes GPU device.
		static bool IsGPUDevice(int dev_type) {

Add support for multiple OpenCL platforms #1345

Add support for multiple OpenCL platforms #1345

Conversation

kazum commented Jun 27, 2018

kazum commented Jun 27, 2018

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmoreau89 commented Jun 29, 2018

kazum commented Jun 29, 2018 • edited Loading

tmoreau89 commented Jun 30, 2018

kazum commented Jun 30, 2018

tmoreau89 commented Jul 2, 2018

Choose a reason for hiding this comment

tqchen commented Jul 3, 2018

kazum commented Jul 3, 2018

tqchen commented Jul 3, 2018

kazum commented Jul 3, 2018

kazum commented Jul 5, 2018

kazum commented Jul 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 8, 2018

kazum commented Jul 8, 2018

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kazum commented Jul 8, 2018

Choose a reason for hiding this comment

kazum commented Jul 9, 2018

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 9, 2018

kazum commented Jun 29, 2018 •

edited

Loading