Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add mdspan reference implementation #299

Merged
merged 7 commits into from
Feb 14, 2023
Merged

Add mdspan reference implementation #299

merged 7 commits into from
Feb 14, 2023

Conversation

youyu3
Copy link
Contributor

@youyu3 youyu3 commented Aug 9, 2022

  • Pulls the mdspan reference implementation from branch "stable" of the kokkos repo, https://github.com/kokkos/mdspan, up to PR 172.
  • Uglified internal identifiers and made some naming convention updates.

int main(int, char**)
{
// workaround to avoid compilation errors
cuda::std::experimental::detail::__make_dynamic_extent<int>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 Could you please add details of the compiler version and flags here? This feels very much like a compiler bug. __make_dynamic_extent<T>() literally just returns T(-1). Its definition should be available, so it shouldn't need to be explicitly instantiated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is due to a compiler bug too. I'll create a reproducer and update here.

Copy link

@mhoemmen mhoemmen Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 If __make_dynamic_extent doesn't work, we could try this:

template<class... Integrals>
  requires((is_­convertible_­v<Integrals, size_­t> && ...))
    explicit extents(Integrals... indices) -> extents<size_t, size_t(indices, -1)...>;

What matters here is on the right side of the ->. The (C++17) fold expression size_t(indices, -1)... expands to something like size_t(-1), size_t(-1), ..., size_t(-1), where the number of -1s is sizeof...(indices).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revision:

template <class... IndexTypes>
extents(IndexTypes...)
 -> extents<std::size_t, std::size_t((IndexTypes(), -1))...>;

The comma expression needed an extra (), else it was being interpreted as a cast with two arguments (not a valid cast).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Pushed the workaround.

#include <cuda/std/array>
#ifndef __cuda_std__
#include <array>
#endif
#ifdef __cpp_lib_span
#include <span>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to protect #include <span> with #ifndef __cuda_std__ ... #endif ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a libcudacxx version of span (or I didn't find one).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 Oh, oof : - ( Should we file an issue to add one? We would need it for mdspan just for a few things that take span as input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have and it worked for my initial tests. More tests to be added.

@@ -73,21 +73,21 @@ class layout_left::mapping {
struct __rank_count {};

template <::std::size_t _r, ::std::size_t _Rank, class _Ip, class... _Indices>
__MDSPAN_INLINE_FUNCTION
__MDSPAN_HOST_DEVICE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 I'm a bit confused -- doesn't MDSPAN_INLINE_FUNCTION include MDSPAN_HOST_DEVICE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. But I'm only adding __host__ __device__ while matching the declaration before the change, where the function wasn't declared as inline.


static_assert( m.is_exhaustive() == true );

assert( m.extents().rank() == 2 );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[OPTIONAL] rank() or rank_dynamic() are always static constexpr and thus can be checked with static_assert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to static_assert.


stdex::layout_right::mapping<stdex::extents<size_t,dyn, dyn>> m{stdex::dextents<size_t,2>{16, 32}};

static_assert( m.is_exhaustive() == true );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_exhaustive() happens to be static constexpr for layout_right, but please note that this is not generally true for all layout mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll change this to assert then.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK here -- I'm just pointing this out in case you want to write generic tests. Thanks!

@miscco
Copy link
Collaborator

miscco commented Sep 20, 2022

Generally speaking I am extremely concerned about the implementation complexity here.

I only looked at extends in detail, because I did implement it previously.

Especially compile times are going to be incredibly difficult here.

Also what I am generally wondering is why we are putting this in namespace stdex. It was merged into the standard after all

@mhoemmen
Copy link

Hi @miscco ! We talked a bit about extents offline. An excellent first step would be to run the benchmarks both with the current version, and with the current version with your extents implementation substituted in. It would also be interesting to see if your implementation performs well at lower optimization levels. (This would cover the important use case of debug builds not being horribly slow, which is important for game development.)

Also what I am generally wondering is why we are putting this in namespace stdex. It was merged into the standard after all

This is a user library (not a vendor's implementation of the Standard Library). Thus, it's not allowed to put things in namespace std.

mdspan was merged into C++23. The reference mdspan implementation works with C++14, 17, and 20 as well. For those back-porting use cases, it's important that we don't break applications' builds, in case those applications adopt a later compiler version that provides an mdspan implementation. Thus, we don't want to collide with std::mdspan.

@mhoemmen
Copy link

mhoemmen commented Sep 20, 2022

@miscco I wonder if some of the complexity of the reference mdspan implementation comes from an earlier requirement to back-port to C++11 (with its much more restrictive requirements for constexpr functions). That requirement has since been dropped. The reference mdspan implementation currently requires at least C++14. I'll file an issue in the reference mdspan implementation to investigate this.

@crtrott
Copy link

crtrott commented Sep 20, 2022

I think the comment about extents could be simpler are fair. Part of the problem was trying to work around a number of issues with empty base optimization in MSVC and Intel compiler. MSVC did get much better though in the last couple years. That said: looking at some assembly it does look like the reference implementation compiles cleaner:

https://godbolt.org/z/Gvsfrn31o

@crtrott
Copy link

crtrott commented Sep 20, 2022

Thinking more about this: there was a lot of work on extents, to essentially make the .extent(i) function optimize really well. Without the complication I think you end up with something which goes through some loop structures to figure out which element to access. I.e. you kinda rebuild the mapping for index to "which element from the dynamic extents array do I need to access" every time or so? But am not 100% sure.

@miscco
Copy link
Collaborator

miscco commented Sep 20, 2022

Yeah I believe my _Dynamic_index function is the issue, I will try to memoize that in a second static array, so there is no runtime cost

@crtrott
Copy link

crtrott commented Sep 20, 2022

Note: with -O3 and GCC 11 it does compile into the same instruction sequence. I just told Mark, I am totally open to entertaining a whole sale replacement on our side. I think we should underpin this with some data down the line, for example we are working on using mdspan in Kokkos which then would make a whole lot of applications use it. If we collect data from that side regarding compile time/binary size/performance and from your applications we should have a good basis to make decisions.

@jrhemstad
Copy link
Collaborator

Can we defer any significant refactoring discussion to future work/PR? I don't want to delay @youyu3's PR any more than necessary.

@mhoemmen
Copy link

@jrhemstad Don't worry! These investigations should not affect the current release. Discussions continue on the reference mdspan implementation issue kokkos/mdspan#190 .

// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, is that license compatible with the usual one.

Do we have SWIPAT clearance for adding. Binary distribution clauses scare me a lot

@youyu3 youyu3 changed the title Adding mdspan reference implementation Add mdspan reference implementation Sep 30, 2022
#include <experimental/mdspan>
#include <cassert>
#include <array>
#include <span>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <span>
#ifdef __cpp_lib_span
#include <span>
#endif

I don't think you can even include <span> unless that feature test macro is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark, you are right. However, I don't think an inclusion of <span> is even needed here, since it's already included by <mdspan>. Tests ran fine without this line.
Actually, #include <array> is not needed here either.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 That's reasonable, though please note that mdspan only includes it if __cpp_lib_span is defined. For example: https://github.com/kokkos/mdspan/blob/9e2b68a3be59c867587cdfbb641153199c813ff5/include/experimental/__p0009_bits/standard_layout_static_array.hpp#L56

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm aware of that. And any test that uses span should be guarded by #ifdef __cpp_lib_span as well (which has been fixed).

@jrhemstad
Copy link
Collaborator

Hey @youyu3 sorry for the last minute change, but based on some internal conversation, I think we can drop the experimental namespace for mdspan.

@youyu3
Copy link
Contributor Author

youyu3 commented Oct 3, 2022

Hey @youyu3 sorry for the last minute change, but based on some internal conversation, I think we can drop the experimental namespace for mdspan.

Okay. Will do. Then I suppose that we should move the files out of the experimental sub-directory as well?

@@ -52,9 +52,6 @@
#endif

_LIBCUDACXX_BEGIN_NAMESPACE_STD
namespace experimental {

__MDSPAN_INLINE_VARIABLE constexpr auto dynamic_extent = ::std::numeric_limits<::std::size_t>::max();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed when span is not available. Fixed in a later commit...

@jrhemstad jrhemstad requested review from mhoemmen and wmaxey and removed request for mhoemmen November 7, 2022 18:06
@@ -0,0 +1,64 @@
//===----------------------------------------------------------------------===//
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent test!

@mhoemmen
Copy link

mhoemmen commented Nov 7, 2022

It's far too much for me to review in detail, but I'm really impressed and happy with all the new unit tests! Thank you also for addressing those code comments we talked about earlier today!

@wmaxey wmaxey added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Nov 8, 2022
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully through with the product code.

That said, I want to reiterate, that I do not believe this is code is maintainable for us. I accept that we currently do not have capacity to write it from scratch and working code is definitely better than no code.

Nevertheless, I would want a commitment that we actually get the time to clean this up once we get the big features we are working on right now out of the window.

Before we can merge it we would need to remove all remnants of std:: and make sure we are using the proper namespaces throughout.

include/cuda/std/detail/libcxx/include/__mdspan/macros.hpp Outdated Show resolved Hide resolved
include/cuda/std/detail/libcxx/include/__mdspan/macros.hpp Outdated Show resolved Hide resolved
include/cuda/std/detail/libcxx/include/__mdspan/macros.hpp Outdated Show resolved Hide resolved
include/cuda/std/detail/libcxx/include/__mdspan/macros.hpp Outdated Show resolved Hide resolved
}

__MDSPAN_INLINE_FUNCTION_DEFAULTED
constexpr __compressed_pair() noexcept = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally, we do not decorate defaulted or delete functions, so this would be
__compressed_pair() = default;

That said this is definitely something we can do afterwards on our own, so no change requested

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco wrote:

Conventionally, we do not decorate defaulted or delete functions....

Does this mean that =defaulted functions do not need the decoration? Historically, we've had issues with that.

* @tparam _ValsSeq
* @tparam __sentinal
*/
template <class _Tag, class _Tp, class _static_t, class _ValsSeq, _static_t __sentinal = static_cast<_static_t>(dynamic_extent),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file is pain

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco I can never get a certain someone to spell "sentinel" right ; - ) .

Copy link
Member

@wmaxey wmaxey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit history appears to be broken. I'll try to resolve this locally and possibly issue a correction, but AFAICT there seems to be a replay of several commits.

@youyu3
Copy link
Contributor Author

youyu3 commented Nov 15, 2022

I pulled commits from the span PR. There are further changes in those files I believe.

Commit history appears to be broken. I'll try to resolve this locally and possibly issue a correction, but AFAICT there seems to be a replay of several commits.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests will likely need //UNSUPPORTED: c++11 below the license and above the includes.

This letds lit know that the tests are not designed for pre-C++14.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://blossom.nvidia.com/sw-cuda-cccl-instance/blue/organizations/jenkins/cluster_launch_test/detail/cluster_launch_test/386/pipeline/1129

Results are crunchy, but it looks like there is maybe only a few failure points.

I made a PR on gitlab that fixes the history. It drops the <span> PR and gets rid of some strange re-merge, but I think the content is identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wesley. I can add the //UNSUPPORTED: c++11 to the github PR. Would you be able to merge that to the gitlab PR for CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what CUDA versions do our CI tests cover?

@@ -534,17 +534,9 @@ struct __standard_layout_psa<_Tag, _Tp, _static_t, _CUDA_VSTD::integer_sequence<
//--------------------------------------------------------------------------

__MDSPAN_INLINE_FUNCTION_DEFAULTED
constexpr __standard_layout_psa() noexcept
#if defined(__clang__) || defined(__MDSPAN_DEFAULTED_CONSTRUCTORS_INHERITANCE_WORKAROUND)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3
Did the above change fix this issue, making this Clang work-around no longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests with clang, yes the work-around is no longer needed.

@@ -19,7 +21,7 @@ void test_span_con()
using TestFixture = TestExtents<T>;
TestFixture t;

auto s = std::span( t.dyn_sizes );
auto s = std::span<const size_t, t.dyn_sizes.size()>( t.dyn_sizes );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youyu3 If you're using std::span, you'll need C++20. Would you consider using cuda::std::span (if it exists) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for testing libcudacxx in the libcxx mode. std::span is implemented in this mode.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah excellent, thank you for explaning!

@@ -49,18 +49,18 @@ int main(int, char**)
test_array_con< cuda::std::tuple_element_t< 4, extents_test_types > >();
test_array_con< cuda::std::tuple_element_t< 5, extents_test_types > >();

static_assert( is_array_cons_avail_v< cuda::std::dextents<int,2>, int , 2 > == true );
static_assert( is_array_cons_avail_v< cuda::std::dextents<int,2>, int , 2 > == true , "" );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, these are changes to support C++14 -- C++17 implements one-argument static_assert.

@miscco miscco merged commit 639bcbc into NVIDIA:main Feb 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS).
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants