Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for half type #78

Merged
merged 2 commits into from
Apr 16, 2019

Conversation

Fznamznon
Copy link
Contributor

No description provided.

bader
bader previously approved these changes Apr 11, 2019
Copy link
Contributor

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge contribution!
I think they are a few details to improve.
Some other refactoring stuff could be postpone for the future.

sycl/include/CL/sycl/intel/sub_group.hpp Outdated Show resolved Hide resolved
#include <CL/sycl/multi_ptr.hpp>

// 4.10.1: Scalar data types
// 4.10.2: SYCL vector types

#ifdef __SYCL_DEVICE_ONLY__
typedef _Float16 half;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace all the typedef by using

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately in this file so many typedefs so I will replace typedef by using where it does violate style of near code. Replacing of all typedefs by usings seems not in scope of this PR...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I need to make a PR to fix some old coding style. :-)

typename std::enable_if<
std::is_fundamental<Ty>::value ||
std::is_same<typename std::remove_const<Ty>::type, half>::value,
vec &>::type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another meta-function to factorize out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to enable this operator only if Rhs has scalar type (or half) and distinguish unoptimized version of it from optimized since optimized version is not available for host half. If you have suggestion as we can re-write it more pretty, please let me know.

@@ -391,7 +427,10 @@ template <typename Type, int NumElements> class vec {
}

template <typename Ty = DataT>
typename std::enable_if<std::is_fundamental<Ty>::value, vec &>::type
typename std::enable_if<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to comment the functions you add/change :-)

const EnableIfNotHostHalf<Ty> Arg1, const DataT Arg2, const DataT Arg3,
const DataT Arg4, const DataT Arg5, const DataT Arg6, const DataT Arg7,
const DataT Arg8, const DataT Arg9, const DataT ArgA, const DataT ArgB,
const DataT ArgC, const DataT ArgD, const DataT ArgE, const DataT ArgF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very ad-hoc to deal with all the cases.
Any hope to have a variadic template solution to handle all the cases?
But I agree it seems out of the scope of this PR since it is history...

sycl/test/basic_tests/half_type.cpp Outdated Show resolved Hide resolved
sycl/test/basic_tests/half_type.cpp Outdated Show resolved Hide resolved
sycl/test/basic_tests/half_type.cpp Outdated Show resolved Hide resolved
sycl/test/basic_tests/half_type.cpp Outdated Show resolved Hide resolved
sycl/test/sub_group/load_store.cpp Outdated Show resolved Hide resolved
uint16_t Exp16 = 0;
uint16_t Frac16 = Frac32 >> 13;

if (__builtin_expect(Exp32 == 0xff || Exp32Diff > 15, 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected :-) compiler to have __builtin_expect() available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell the compiler that chances are that this branch will not be reached.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. But the problem I see is that it is not standard C++ and it is not guarded by any compiler-specific preprocessor guard...
Is it to be executed only on the device side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation only works on host side. On device side, we use _Float16 from clang.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So see my question above which is not about the meaning of __builtin_expect() but about what are the compilers supporting this extension? MSVC, ICC, GCC, Clang,... ?
The problem is that SYCL allows to use a compiler different from Clang/LLVM for the host side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could argue that this is SYCL runtime library (not part of the SYCL headers), which can be deployed to user in binary form and "a compiler for the host side" is not relevant, but it's quite simple thing to fix by checking if __builtin_extect is supported: https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand then that you envision that the SYCL host runtime should be different according to the host compiler?
I could argue that this seems difficult to manage at scale.
So I prefer to agree with the end of your sentence. :-)

sycl/test/basic_tests/half_type.cpp Outdated Show resolved Hide resolved

inline bool bitwise_comparison_fp32(const half val, const uint32_t exp) {
const float fp32 = static_cast<float>(val);
return *reinterpret_cast<const uint32_t *>(&fp32) == exp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right you need the reference.

Signed-off-by: Mariya Podchishchaeva <[email protected]>
Half type is defined on SYCL device as _Float16. This type is mangled by
clang as "DF16_". OclCxxRewrite doesn't support this mangling and skips
all OpenCL built-ins with _Float16 type. So these built-ins are not
translated properly to SPIRV without this change.

Signed-off-by: Mariya Podchishchaeva <[email protected]>
@Fznamznon Fznamznon force-pushed the private/mpodchis/half branch from 82ad152 to 1e6ab92 Compare April 15, 2019 07:53
@romanovvlad romanovvlad merged commit 22173c9 into intel:sycl Apr 16, 2019
bb-sycl pushed a commit that referenced this pull request Aug 12, 2019
Summary:
feature coverage is a useful signal that is available during the merge
process, but was not printed previously.

Output example:

```
$ ./fuzzer -use_value_profile=1 -merge=1 new_corpus/ seed_corpus/
INFO: Seed: 1676551929
INFO: Loaded 1 modules   (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), 
INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), 
MERGE-OUTER: 180 files, 78 in the initial corpus
MERGE-OUTER: attempt 1
INFO: Seed: 1676574577
INFO: Loaded 1 modules   (2380 inline 8-bit counters): 2380 [0x90d180, 0x90dacc), 
INFO: Loaded 1 PC tables (2380 PCs): 2380 [0x684018,0x68d4d8), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 1048576 bytes
MERGE-INNER: using the control file '/tmp/libFuzzerTemp.111754.txt'
MERGE-INNER: 180 total files; 0 processed earlier; will process 180 files now
#1	pulse  cov: 134 ft: 330 exec/s: 0 rss: 37Mb
#2	pulse  cov: 142 ft: 462 exec/s: 0 rss: 38Mb
#4	pulse  cov: 152 ft: 651 exec/s: 0 rss: 38Mb
#8	pulse  cov: 152 ft: 943 exec/s: 0 rss: 38Mb
#16	pulse  cov: 520 ft: 2783 exec/s: 0 rss: 39Mb
#32	pulse  cov: 552 ft: 3280 exec/s: 0 rss: 41Mb
#64	pulse  cov: 576 ft: 3641 exec/s: 0 rss: 50Mb
#78	LOADED cov: 602 ft: 3936 exec/s: 0 rss: 88Mb
#128	pulse  cov: 611 ft: 3996 exec/s: 0 rss: 93Mb
#180	DONE   cov: 611 ft: 4016 exec/s: 0 rss: 155Mb
MERGE-OUTER: succesfull in 1 attempt(s)
MERGE-OUTER: the control file has 39741 bytes
MERGE-OUTER: consumed 0Mb (37Mb rss) to parse the control file
MERGE-OUTER: 9 new files with 80 new features added; 9 new coverage edges
```

Reviewers: hctim, morehouse

Reviewed By: morehouse

Subscribers: delcypher, #sanitizers, llvm-commits, kcc

Tags: #llvm, #sanitizers

Differential Revision: https://reviews.llvm.org/D66030

llvm-svn: 368617
iclsrc pushed a commit that referenced this pull request Sep 21, 2023
…… (#67069)

We noticed some performance issue while in lldb-vscode for grabing the
name of the SBValue. Profiling shows SBValue::GetName() can cause
synthetic children provider of shared/unique_ptr to deference underlying
object and complete it type.

This patch lazily moves the dereference from synthetic child provider's
Update() method to GetChildAtIndex() so that SBValue::GetName() won't
trigger the slow code path.

Here is the culprit slow code path:
```
...
frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt]
...
frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt]
frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt]
frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt]
...
frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt]
frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt]
frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt]
frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt]
frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt]
frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt]
...
```

Differential Revision: https://reviews.llvm.org/D159542
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants