-
Notifications
You must be signed in to change notification settings - Fork 745
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYCL][NATIVECPU][LIBCLC] Use libclc for SYCL Native CPU (#10970)
This PR allows linking to libclc when compiling for SYCL Native CPU. Currently only the `x86_64-unknown-linux-gnu` target triple is supported, additional target triples (and possibly a more versatile way of setting them) will come with follow up PRs. Some useful information for reviewing: * We start using an `AddrSpaceMap` (set in `TargetInfo.cpp`) because the mangled names emitted by the device compiler need to match with the names provided by `libclc`. The AddressSpaceMap is taken from the `PTX` Target. * Changes in `Driver` are needed to find and link to `libclc`. * `libclc/ptx-nvidiacl/libspirv/atomic/loadstore_helpers.ll` has been split into 4 modules, one for each memory ordering constraint. Copies of these modules have been added in `generic` (because some functions in `generic/libspirv/atomic` needed them), and the module split allows to specialize the file for targets that may not support some orderings. Currently only a couple of function for `acquire` and `seq_cst` have been implemented for `generic`, but the others will be implemented in a follow up PR. * We've added a target in `libclc` for `x86_64-unknown-linux`. This has been done because some math builtins in `generic` have been defined as ``` typedef char vec __attribute__((ext_vector_type(8))); __attribute__((overloadable)) vec __clc_native_popcount(vec x) __asm("llvm.ctpop" ".v16i" "8"); vec call(vec x) { return __clc_native_popcount(x); } ``` While this approach conveniently allows to call directly LLVM intrinsics, it does seem to play well with the ABI for `x86_64-unknown-linux`, since it leads to this IR: ``` define dso_local double @call(double noundef %x.coerce) #0 { entry: %0 = bitcast double %x.coerce to <8 x i8> %1 = bitcast <8 x i8> %0 to double %call = call double @llvm.ctpop.v8i8(double noundef %1) #8 %2 = bitcast double %call to <8 x i8> %3 = bitcast <8 x i8> %2 to double ret double %3 } ``` Which is invalid because `lvm.ctpop.v8i8` expect a vector of `i8` and not a `double`, leading to failing asserts in the compiler that prevented from building `libclc`. As a temporary work around we have added empty files that override the files in `generic` when building for `x86_64-unknown-linux`, allowing to complete the build, even though the corresponding builtins will be missing from the library. We are working on a proper solution for this. --------- Co-authored-by: Uwe Dolinsky <[email protected]>
- Loading branch information
1 parent
a162179
commit 6da4d2e
Showing
44 changed files
with
810 additions
and
389 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
58 changes: 58 additions & 0 deletions
58
libclc/generic/libspirv/atomic/loadstore_helpers_acquire.ll
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#if __clang_major__ >= 7 | ||
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5" | ||
#else | ||
target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" | ||
#endif | ||
; This file contains helper functions for the acquire memory ordering constraint. | ||
; Other targets can specialize this file to account for unsupported features in their backend. | ||
|
||
declare void @llvm.trap() | ||
|
||
define i32 @__clc__atomic_load_global_4_acquire(i32 addrspace(1)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i32 @__clc__atomic_load_local_4_acquire(i32 addrspace(3)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i64 @__clc__atomic_load_global_8_acquire(i64 addrspace(1)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i64 @__clc__atomic_load_local_8_acquire(i64 addrspace(3)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i32 @__clc__atomic_uload_global_4_acquire(i32 addrspace(1)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
%0 = load atomic volatile i32, i32 addrspace(1)* %ptr acquire, align 4 | ||
ret i32 %0 | ||
} | ||
|
||
define i32 @__clc__atomic_uload_local_4_acquire(i32 addrspace(3)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i64 @__clc__atomic_uload_global_8_acquire(i64 addrspace(1)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define i64 @__clc__atomic_uload_local_8_acquire(i64 addrspace(3)* nocapture %ptr) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
58 changes: 58 additions & 0 deletions
58
libclc/generic/libspirv/atomic/loadstore_helpers_release.ll
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#if __clang_major__ >= 7 | ||
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5" | ||
#else | ||
target datalayout = "e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64" | ||
#endif | ||
; This file contains helper functions for the release memory ordering constraint. | ||
; Other targets can specialize this file to account for unsupported features in their backend. | ||
|
||
declare void @llvm.trap() | ||
|
||
define void @__clc__atomic_store_global_4_release(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_store_local_4_release(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_store_global_8_release(i64 addrspace(1)* nocapture %ptr, i64 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_store_local_8_release(i64 addrspace(3)* nocapture %ptr, i64 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_ustore_global_4_release(i32 addrspace(1)* nocapture %ptr, i32 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_ustore_local_4_release(i32 addrspace(3)* nocapture %ptr, i32 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_ustore_global_8_release(i64 addrspace(1)* nocapture %ptr, i64 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
||
define void @__clc__atomic_ustore_local_8_release(i64 addrspace(3)* nocapture %ptr, i64 %value) nounwind alwaysinline { | ||
entry: | ||
tail call void @llvm.trap() | ||
unreachable | ||
} | ||
|
Oops, something went wrong.