[SYCL] Implement basic sub-buffers support #64

Fznamznon · 2019-04-05T13:46:19Z

Signed-off-by: Mariya Podchishchaeva [email protected]

Signed-off-by: Mariya Podchishchaeva <[email protected]>

keryell

Nice to have this new feature! \o/

Just a few things that could be clarified.

keryell · 2019-04-05T16:00:16Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+    const int N = 7;
+    int Result[M][N] = {0};
+    {
+      auto OrigRange = range<2>(M, N);


What about using modern C++ { } everywhere instead of old troublesome ()?
Specially if people look at tests as good coding example and if we consider that SYCL is about modern C++ and heterogeneous computing... :-)

keryell · 2019-04-05T16:00:36Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+    {
+      auto OrigRange = range<2>(M, N);
+      buffer<int, 2> Buffer(OrigRange);
+      Buffer.set_final_data((int *)Result);


Old C cast in modern C++ code...

keryell · 2019-04-05T16:04:36Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+  bool Failed = false;
+  // Basic test case
+  {
+    const int M = 6;


I think you can replace all the const something in this code by constexpr something

keryell · 2019-04-05T16:09:36Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+    for (size_t i = 0; i < M; ++i) {
+      for (size_t j = 0; j < N; ++j) {
+        size_t Expected =
+            ((i == 0) || (i == M - 1) || (j == 0) || (j == N - 1)) ? 0 : 1;


Not clear why a size_t. Actually a bool looks fine to me.

auto Expected = !(i == 0) || (i == M - 1) || (j == 0) || (j == N - 1));

Replacing auto by int is also possible if you think it would cause the reader more time to understand the next line...

keryell · 2019-04-05T16:10:42Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+  {
+    const int M = 10;
+    int Data[M] = {0};
+    auto OrigRange = range<1>(M);


Shorter:

range<1> OrigRange { M };

keryell · 2019-04-05T16:11:27Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+    int Data[M] = {0};
+    auto OrigRange = range<1>(M);
+    buffer<int, 1> Buffer(Data, OrigRange);
+    auto Offset = id<1>(1);


Idem for the 3 next ones

keryell · 2019-04-05T16:13:14Z

sycl/test/basic_tests/buffer/subbuffer.cpp

+    });
+    auto Acc = Buffer.get_access<cl::sycl::access::mode::read>();
+    for (size_t i = 0; i < M; ++i) {
+      size_t Expected = (i > 1 && i < M - 2) ? 1 : 0;


int Expected = i > 1 && i < M - 2;

keryell · 2019-04-05T16:14:31Z

sycl/include/CL/sycl/buffer.hpp

@@ -33,54 +33,58 @@ class buffer {

  buffer(const range<dimensions> &bufferRange,
         const property_list &propList = {})
-      : Range(bufferRange) {
+      : Range(bufferRange), MemRange(bufferRange) {


What about writing modern C++ with some { } instead of () for most of the instance constructions?

Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555

Summary: feature coverage is a useful signal that is available during the merge process, but was not printed previously. Output example: ``` $ ./fuzzer -use_value_profile=1 -merge=1 new_corpus/ seed_corpus/ INFO: Seed: 1676551929 INFO: Loaded 1 modules (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), MERGE-OUTER: 180 files, 78 in the initial corpus MERGE-OUTER: attempt 1 INFO: Seed: 1676574577 INFO: Loaded 1 modules (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes MERGE-INNER: using the control file '/tmp/libFuzzerTemp.111754.txt' MERGE-INNER: 180 total files; 0 processed earlier; will process 180 files now #1 pulse cov: 134 ft: 330 exec/s: 0 rss: 37Mb #2 pulse cov: 142 ft: 462 exec/s: 0 rss: 38Mb #4 pulse cov: 152 ft: 651 exec/s: 0 rss: 38Mb #8 pulse cov: 152 ft: 943 exec/s: 0 rss: 38Mb #16 pulse cov: 520 ft: 2783 exec/s: 0 rss: 39Mb #32 pulse cov: 552 ft: 3280 exec/s: 0 rss: 41Mb #64 pulse cov: 576 ft: 3641 exec/s: 0 rss: 50Mb #78 LOADED cov: 602 ft: 3936 exec/s: 0 rss: 88Mb #128 pulse cov: 611 ft: 3996 exec/s: 0 rss: 93Mb #180 DONE cov: 611 ft: 4016 exec/s: 0 rss: 155Mb MERGE-OUTER: succesfull in 1 attempt(s) MERGE-OUTER: the control file has 39741 bytes MERGE-OUTER: consumed 0Mb (37Mb rss) to parse the control file MERGE-OUTER: 9 new files with 80 new features added; 9 new coverage edges ``` Reviewers: hctim, morehouse Reviewed By: morehouse Subscribers: delcypher, #sanitizers, llvm-commits, kcc Tags: #llvm, #sanitizers Differential Revision: https://reviews.llvm.org/D66030 llvm-svn: 368617

Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, #16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, #16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, #48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, #32] mul x3, x15, x1 cmp x9, #512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437

[SYCL] Implement basic sub-buffers support

da043ab

Signed-off-by: Mariya Podchishchaeva <[email protected]>

bader approved these changes Apr 5, 2019

View reviewed changes

vladimirlaz merged commit 82fead6 into intel:sycl Apr 5, 2019

keryell reviewed Apr 5, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Implement basic sub-buffers support #64

[SYCL] Implement basic sub-buffers support #64

Fznamznon commented Apr 5, 2019

keryell left a comment

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

keryell Apr 5, 2019

[SYCL] Implement basic sub-buffers support #64

[SYCL] Implement basic sub-buffers support #64

Conversation

Fznamznon commented Apr 5, 2019

keryell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment