Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UB in atomics with automatic storage #2586

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wmaxey
Copy link
Member

@wmaxey wmaxey commented Oct 16, 2024

Description

Closes #2585

See #478 for more details.

This fixes UB when using automatic storage with atomics (which has little or no use-case)

Example of users encountering wrong values in automatic storage: https://stackoverflow.com/questions/75640847/what-exactly-does-cudaatomic-does

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wmaxey wmaxey requested review from a team as code owners October 16, 2024 16:52
@wmaxey wmaxey self-assigned this Oct 16, 2024
// uses inline PTX to bypass __isLocal.
_LIBCUDACXX_BEGIN_NAMESPACE_STD

_CCCL_DEVICE inline bool __cuda_is_local(const void* __ptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prefix those with __cccl?

{
return false;
}
if (0 == __atomic_memcmp((const void*) __ptr, (const void*) __expected, sizeof(_Type)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please swap

Suggested change
if (0 == __atomic_memcmp((const void*) __ptr, (const void*) __expected, sizeof(_Type)))
if (__atomic_memcmp((const void*) __ptr, (const void*) __expected, sizeof(_Type)) == 0)

}
memcpy((void*) __ptr, (void const*) &__val, sizeof(_Type));
return true;
#else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the comments

Comment on lines +88 to +103
template <class _Type>
_CCCL_DEVICE bool __cuda_load_weak_if_local(const volatile _Type* __ptr, _Type* __ret)
{
#if defined(_LIBCUDACXX_ATOMIC_UNSAFE_AUTOMATIC_STORAGE)
if (!__cuda_is_local((const void*) __ptr))
{
return false;
}
memcpy((void*) __ret, (void const*) __ptr, sizeof(_Type));
// Required to workaround a compiler bug, see nvbug/4064730
__nanosleep(0);
return true;
#else
return false;
#endif
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: This is only needing the size of _Type

Could we pass in a void* and either template this on size_t or pass it as an argument?

Applies below

_CCCL_DEVICE bool
__cuda_compare_exchange_weak_if_local(volatile _Type* __ptr, _Type* __expected, const _Type* __desired, bool* __success)
{
if (!__cuda_is_local((const void*) __ptr))
Copy link
Collaborator

@gonzalobg gonzalobg Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing the #if defined(_LIBCUDACXX_ATOMIC_UNSAFE_AUTOMATIC_STORAGE).
I'd expect that we'd want it here for the same reason we want it in the other ops.

And ditto for the ops below.

(If we don't need it, I don't remember why)

// Enable bypassing automatic storage checks in atomics when using CTK 12.2 and below and if NDEBUG is defined.
# if defined(_CCCL_CUDACC_BELOW_12_2) && !defined(NDEBUG)
# define _LIBCUDACXX_ATOMIC_UNSAFE_AUTOMATIC_STORAGE
# endif // _CCCL_CUDACC_BELOW_12_2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gevtushenko fyi (I remember you had feedback about this)

@@ -154,6 +154,8 @@ static inline _CCCL_DEVICE bool __atomic_compare_exchange_cuda(_Type* __ptr, _Ty
__proxy_t* __ptr_proxy = reinterpret_cast<__proxy_t*>(__ptr);
__proxy_t* __exp_proxy = reinterpret_cast<__proxy_t*>(__exp);
__proxy_t* __des_proxy = reinterpret_cast<__proxy_t*>(&__des);
bool __res = false;
if (__cuda_compare_exchange_weak_if_local(__ptr_proxy, __exp_proxy, __des_proxy, &__res)) {{return __res;}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a n00b question, but why the double braces {{}}?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

[BUG]: Invalid results with automatic storage when using std::atomic
4 participants