Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures when compiling with nvcc on develop #908

Closed
2 tasks done
lifflander opened this issue Jul 3, 2020 · 3 comments · Fixed by #919
Closed
2 tasks done

Failures when compiling with nvcc on develop #908

lifflander opened this issue Jul 3, 2020 · 3 comments · Fixed by #919
Assignees

Comments

@lifflander
Copy link
Collaborator

lifflander commented Jul 3, 2020

Describe the bug
There are several failures compiling, per issue #904, shown in the PR that adds CUDA/nvcc to the CI targets #907

Failures seen so far:

  • RDMA Handles (handle.index.impl.h & handle.node.impl.h)
  • Message serialize (something related to NonSerializedMsg)
/vt/src/vt/messaging/message/message_serialize.h:78:89: error: use of deleted function 'void vt::messaging::NonSerializedMsg<MsgT, SelfT>::serialize(SerializerT&) [with SerializerT = checkpoint::Sizer&; MsgT = vt::collective::reduce::operators::ReduceTMsg<vt::vrt::collection::balance::LoadData>; SelfT = vt::vrt::collection::balance::ProcStatsMsg]'
813
 static constexpr auto const has_own_serialize =
814
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~                                         ^                                         
815
/vt/src/vt/messaging/message/message_serialize.h:282:8: note: declared here
816
   void serialize(SerializerT& s) = delete;
817
        ^~~~~~~~~
818

Add more to this list as we see them.

@lifflander
Copy link
Collaborator Author

I've fixed the RDMA handle issue by moving the SFINAE overloads.

pnstickne added a commit that referenced this issue Jul 6, 2020
- decltype being invoked in NVCC (only?!?) in what appears to
  be outside of SFINAE context. Try to ensure this is no longer
  the case..
pnstickne added a commit that referenced this issue Jul 6, 2020
- NVCC warning on constexpr assignment of -1 to unsigned type.
pnstickne added a commit that referenced this issue Jul 6, 2020
- Moving out the first test.. or perhaps even
  yield a better message.
pnstickne added a commit that referenced this issue Jul 6, 2020
- Moving out the first test.. or perhaps even
  yield a better message.
pnstickne added a commit that referenced this issue Jul 6, 2020
- Really, not much of an idea. About to ensure that this
  behavior triggers quirks and type-checks are disabled.
pnstickne added a commit that referenced this issue Jul 6, 2020
- Something is odd.
  The first case should not have selected any types in which
  there was a deleted member..
pnstickne added a commit that referenced this issue Jul 7, 2020
- Something is odd.
  The first case should not have selected any types in which
  there was a deleted member..
pnstickne added a commit that referenced this issue Jul 7, 2020
@pnstickne
Copy link
Contributor

There is a failure much later from checkpoint:

2020-07-07T08:03:43.3550814Z [250/441] Building CXX object tests/CMakeFiles/test_rdma_handle.dir/unit/rdma/test_rdma_handle.cc.o
2020-07-07T08:03:43.3551079Z FAILED: tests/CMakeFiles/test_rdma_handle.dir/unit/rdma/test_rdma_handle.cc.o 
2020-07-07T08:03:43.3552849Z /usr/bin/ccache /nvcc_wrapper/build/nvcc_wrapper  -DFMT_HEADER_ONLY=1 -DFMT_USE_USER_DEFINED_LITERALS=0 -DHAS_DETECTION_COMPONENT=1 -I/vt/tests/unit -isystem /vt/tests/extern/googletest/googletest/include -I/vt/lib/fmt -I/vt/lib/CLI -isystem /vt/tests/extern/googletest/googletest -Irelease -I/vt/src -isystem /usr/local/include -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -O3 -DNDEBUG   -fdiagnostics-color=always -std=c++1y -fPIC -fopenmp -std=c++14 -MD -MT tests/CMakeFiles/test_rdma_handle.dir/unit/rdma/test_rdma_handle.cc.o -MF tests/CMakeFiles/test_rdma_handle.dir/unit/rdma/test_rdma_handle.cc.o.d -o tests/CMakeFiles/test_rdma_handle.dir/unit/rdma/test_rdma_handle.cc.o -c /vt/tests/unit/rdma/test_rdma_handle.cc
2020-07-07T08:03:43.3553672Z nvcc_wrapper does not accept standard flags -std=c++1y since partial standard flags and standards after C++14 are not supported. nvcc_wrapper will use -std=c++14 instead. It is undefined behavior to use this flag. This should only be occurring during CMake configuration.
2020-07-07T08:03:43.3554245Z nvcc_wrapper - *warning* you have set multiple standard flags (-std=c++1* or --std=c++1*), only the last is used because nvcc can only accept a single std setting
2020-07-07T08:03:43.3555795Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h: In instantiation of 'void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::apply(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNotVirtualSerialize<U>*) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int; checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNotVirtualSerialize<U> = vt::rdma::impl::HandleData]':
2020-07-07T08:03:43.3557352Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:99:13:   required from 'void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::operator()(SerializerT&, T*, checkpoint::SerialSizeType) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int]'
2020-07-07T08:03:43.3558853Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_byte.h:107:9:   required from 'void checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::operator()(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::isNotByteCopyType<U>*) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int; checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::isNotByteCopyType<U> = vt::rdma::impl::HandleData]'
2020-07-07T08:03:43.3560017Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch.impl.h:71:3:   required from 'static TraverserT& checkpoint::dispatch::Traverse::with(T&, TraverserT&, checkpoint::SerialSizeType) [with T = vt::rdma::impl::HandleData; TraverserT = checkpoint::Sizer; checkpoint::SerialSizeType = long unsigned int]'
2020-07-07T08:03:43.3560890Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch.impl.h:55:32:   required from 'Serializer& checkpoint::operator|(Serializer&, T&) [with Serializer = checkpoint::Sizer; T = vt::rdma::impl::HandleData]'
2020-07-07T08:03:43.3561728Z /vt/src/vt/collective/reduce/operators/default_msg.h:95:3:   required from 'void vt::collective::reduce::operators::ReduceDataMsg<DataType>::serialize(SerializeT&) [with SerializeT = checkpoint::Sizer; DataType = vt::rdma::impl::HandleData]'
2020-07-07T08:03:43.3562293Z /vt/src/vt/messaging/message/message_serialize.h:396:16:   [ skipping 10 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
2020-07-07T08:03:43.3563727Z /vt/src/vt/collective/reduce/operators/default_op.impl.h:64:1:   required from 'static void vt::collective::reduce::operators::ReduceCombine<T>::msgHandler(MsgT*) [with MsgT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >; Op = vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>; ActOp = vt::collective::reduce::operators::ReduceCallback<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> > >; T = void]'
2020-07-07T08:03:43.3565836Z /vt/src/vt/objgroup/proxy/proxy_objgroup.h:134:21:   required by substitution of 'template<class OpT, class MsgPtrT, class MsgT, void (* f)(MsgT*)> void vt::objgroup::proxy::Proxy<vt::rdma::Manager>::reduce<OpT, MsgPtrT, MsgT, f>(MsgPtrT, vt::Callback<MsgT>, vt::objgroup::proxy::Proxy<vt::rdma::Manager>::ReduceStamp) const [with OpT = vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>; MsgPtrT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >*; MsgT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >; void (* f)(MsgT*) = vt::collective::reduce::operators::ReduceCombine<>::msgHandler<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >, vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>, vt::collective::reduce::operators::ReduceCallback<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> > > >]'
2020-07-07T08:03:43.3567104Z /vt/src/vt/rdmahandle/manager.impl.h:139:1:   required from 'vt::rdma::Handle<T, E> vt::rdma::Manager::makeHandleCollectiveObjGroup(ProxyT, std::size_t, bool) [with T = short unsigned int; vt::rdma::HandleEnum E = (vt::rdma::HandleEnum)1; ProxyT = vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup>; std::size_t = long unsigned int]'
2020-07-07T08:03:43.3567993Z /vt/src/vt/objgroup/proxy/proxy_objgroup.impl.h:169:128:   required from 'vt::rdma::Handle<T> vt::objgroup::proxy::Proxy<ObjT>::makeHandleRDMA(std::size_t, bool) const [with T = short unsigned int; ObjT = vt::tests::unit::TestObjGroup; std::size_t = long unsigned int]'
2020-07-07T08:03:43.3568938Z /vt/tests/unit/rdma/test_rdma_handle.cc:64:61:   required from 'vt::HandleRDMA<T, short int> vt::tests::unit::TestObjGroup::makeHandle(std::size_t, bool) [with T = short unsigned int; vt::HandleRDMA<T, short int> = vt::rdma::Handle<short unsigned int, (vt::rdma::HandleEnum)1, short int, void>; std::size_t = long unsigned int]'
2020-07-07T08:03:43.3569818Z /vt/tests/unit/rdma/test_rdma_handle.cc:245:50:   required from 'void vt::tests::unit::gtest_suite_TestRDMAHandle_::test_rdma_handle_5<gtest_TypeParam_>::TestBody() [with gtest_TypeParam_ = short unsigned int]'
2020-07-07T08:03:43.3570051Z /vt/src/vt/group/region/group_shallow_list.h:55:8:   required from here
2020-07-07T08:03:43.3571117Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:218:19: error: no matching function for call to 'checkpoint::dispatch::SerializerDispatchNonByte<checkpoint::Sizer, vt::rdma::impl::HandleData, checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData> >::applyStatic(checkpoint::Sizer&, vt::rdma::impl::HandleData*&, checkpoint::SerialSizeType&)'
2020-07-07T08:03:43.3571377Z      return applyStatic(s, val, num);
2020-07-07T08:03:43.3571533Z         ~~~~~~~~~~~^~~~~~~~~~~~~
2020-07-07T08:03:43.3571870Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:163:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasInSerialize<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-07T08:03:43.3572168Z    void applyStatic(
2020-07-07T08:03:43.3572310Z  ^ ~~~~~~~~~
2020-07-07T08:03:43.3572502Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:163:1: note:   template argument deduction/substitution failed:
2020-07-07T08:03:43.3572859Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:177:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNoninSerialize<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-07T08:03:43.3573159Z    void applyStatic(
2020-07-07T08:03:43.3573297Z  ^ ~~~~~~~~~
2020-07-07T08:03:43.3647914Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:177:1: note:   template argument deduction/substitution failed:
2020-07-07T08:03:43.3648322Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:190:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::isEnum<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-07T08:03:43.3648816Z    void applyStatic(
2020-07-07T08:03:43.3654477Z  ^ ~~~~~~~~~
2020-07-07T08:03:43.3654760Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:190:1: note:   template argument deduction/substitution failed:
2020-07-07T08:03:43.3655695Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:218:31: error: return-statement with a value, in function returning 'void' [-fpermissive]
2020-07-07T08:03:43.3655911Z      return applyStatic(s, val, num);
2020-07-07T08:03:43.3656091Z 

pnstickne added a commit that referenced this issue Jul 7, 2020
- NVCC warning on constexpr assignment of -1 to unsigned type.
pnstickne added a commit that referenced this issue Jul 7, 2020
- Usage of decltype for a deleted member in ~some~ SFINAE
  contexts is failing with an error instead of failing
  the substitution.

  Pulling out a pre member check AND using conjunction
  instead of 'and' appears to appears the compiler.
  Neither by themselves is sufficient, nor is moving the
  conjunction inside the has_own_member template sufficient.
pnstickne added a commit that referenced this issue Jul 7, 2020
- Usage of decltype for a deleted member in ~some~ SFINAE
  contexts is failing with an error instead of failing
  the substitution.

  Pulling out a pre member check AND using conjunction
  instead of 'and' appears to appears the compiler.
  Neither by themselves is sufficient, nor is moving the
  conjunction inside the has_own_member template sufficient.
pnstickne added a commit that referenced this issue Jul 12, 2020
- Makes it easier to identify/change.
pnstickne added a commit that referenced this issue Jul 12, 2020
pnstickne added a commit that referenced this issue Jul 12, 2020
- Makes it easier to identify/change.
pnstickne added a commit that referenced this issue Jul 12, 2020
pnstickne added a commit that referenced this issue Jul 13, 2020
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
pnstickne added a commit that referenced this issue Jul 14, 2020
- These fail in NVCC 11, but passed in 10.1..
pnstickne added a commit that referenced this issue Jul 14, 2020
- Try to trace down where NVCC is divergent.
pnstickne added a commit that referenced this issue Jul 14, 2020
- This should force the issue where NVCC appears to be attempting
  to serialization such messages.
pnstickne added a commit that referenced this issue Jul 14, 2020
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
pnstickne added a commit that referenced this issue Jul 14, 2020
- This might force the issue where NVCC appears to be attempting
  to serialization such messages.

  The RdmaData/RdmaType messages were previously shown to be
  trivially byte-copyable to NVCC via a static assert.
pnstickne added a commit that referenced this issue Jul 28, 2020
- Makes it easier to identify/change.
pnstickne added a commit that referenced this issue Jul 28, 2020
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
pnstickne added a commit that referenced this issue Jul 28, 2020
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
pnstickne added a commit that referenced this issue Jul 28, 2020
- NVCC warning on constexpr assignment of -1 to unsigned type.
pnstickne added a commit that referenced this issue Jul 28, 2020
- Usage of decltype for a deleted member in ~some~ SFINAE
  contexts is failing with an error instead of failing
  the substitution.

  Pulling out a pre member check AND using conjunction
  instead of 'and' appears to appears the compiler.
  Neither by themselves is sufficient, nor is moving the
  conjunction inside the has_own_member template sufficient.
pnstickne added a commit that referenced this issue Jul 28, 2020
- Makes it easier to identify/change.
pnstickne added a commit that referenced this issue Jul 28, 2020
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
pnstickne added a commit that referenced this issue Jul 28, 2020
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
@PhilMiller PhilMiller linked a pull request Jul 28, 2020 that will close this issue
lifflander pushed a commit that referenced this issue Jul 28, 2020
- NVCC warning on constexpr assignment of -1 to unsigned type.
lifflander pushed a commit that referenced this issue Jul 28, 2020
- Usage of decltype for a deleted member in ~some~ SFINAE
  contexts is failing with an error instead of failing
  the substitution.

  Pulling out a pre member check AND using conjunction
  instead of 'and' appears to appears the compiler.
  Neither by themselves is sufficient, nor is moving the
  conjunction inside the has_own_member template sufficient.
lifflander pushed a commit that referenced this issue Jul 28, 2020
- Makes it easier to identify/change.
lifflander pushed a commit that referenced this issue Jul 28, 2020
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
lifflander pushed a commit that referenced this issue Jul 28, 2020
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
lifflander pushed a commit that referenced this issue Jul 29, 2020
- NVCC warning on constexpr assignment of -1 to unsigned type.
lifflander pushed a commit that referenced this issue Jul 29, 2020
- Usage of decltype for a deleted member in ~some~ SFINAE
  contexts is failing with an error instead of failing
  the substitution.

  Pulling out a pre member check AND using conjunction
  instead of 'and' appears to appears the compiler.
  Neither by themselves is sufficient, nor is moving the
  conjunction inside the has_own_member template sufficient.
lifflander pushed a commit that referenced this issue Jul 29, 2020
- Makes it easier to identify/change.
lifflander pushed a commit that referenced this issue Jul 29, 2020
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
lifflander pushed a commit that referenced this issue Jul 29, 2020
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
@PhilMiller
Copy link
Member

nvcc is tested in CI, and the necessary changes were merged to develop. Hence, closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants