Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

904 Add nvcc (cuda) to CI #907

Merged
merged 22 commits into from
Jul 29, 2020
Merged

904 Add nvcc (cuda) to CI #907

merged 22 commits into from
Jul 29, 2020

Conversation

lifflander
Copy link
Collaborator

@lifflander lifflander commented Jul 3, 2020

Fixes #904

TODO:

  • See if we can make the container smaller... 7 GB currently!
  • Fix all the compile-time bugs (see issue Failures when compiling with nvcc on develop #908)
  • See if we can speedup compilation if its taking too long in the container to run correctly once the bugs are solved

@lifflander
Copy link
Collaborator Author

I worked around one nvcc bug on develop. But we have several more. And it's really slow to compile...

@codecov
Copy link

codecov bot commented Jul 3, 2020

Codecov Report

Merging #907 into develop will not change coverage.
The diff coverage is 25.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #907   +/-   ##
========================================
  Coverage    77.21%   77.21%           
========================================
  Files          648      648           
  Lines        24837    24837           
========================================
  Hits         19179    19179           
  Misses        5658     5658           
Impacted Files Coverage Δ
src/vt/messaging/active.impl.h 97.79% <ø> (ø)
src/vt/messaging/message/message_serialize.h 100.00% <ø> (ø)
src/vt/rdmahandle/handle.index.impl.h 0.00% <0.00%> (ø)
src/vt/rdmahandle/handle.node.impl.h 0.00% <0.00%> (ø)
src/vt/rdmahandle/manager.impl.h 0.00% <ø> (ø)
tests/perf/ping_pong.cc 95.65% <100.00%> (ø)
tests/unit/pool/test_pool_message_sizes.cc 100.00% <100.00%> (ø)

@lifflander
Copy link
Collaborator Author

I think we should start moving away from Travis CI. We are hitting their time limit occasionally causing spurious failures. To do this we need a GitHub Action that does the code coverage and uploads it from the container

@pnstickne
Copy link
Contributor

pnstickne commented Jul 12, 2020

After fixing void return.

2020-07-12T20:14:57.4110898Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h: In instantiation of 'void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::apply(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNotVirtualSerialize<U>*) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int; checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNotVirtualSerialize<U> = vt::rdma::impl::HandleData]':
2020-07-12T20:14:57.4112358Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:99:6:   required from 'void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::operator()(SerializerT&, T*, checkpoint::SerialSizeType) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int]'
2020-07-12T20:14:57.4113949Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_byte.h:107:9:   required from 'void checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::operator()(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::isNotByteCopyType<U>*) [with U = vt::rdma::impl::HandleData; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>; checkpoint::SerialSizeType = long unsigned int; checkpoint::dispatch::SerializerDispatchByte<SerializerT, T, Dispatcher>::isNotByteCopyType<U> = vt::rdma::impl::HandleData]'
2020-07-12T20:14:57.4115055Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch.impl.h:71:3:   required from 'static TraverserT& checkpoint::dispatch::Traverse::with(T&, TraverserT&, checkpoint::SerialSizeType) [with T = vt::rdma::impl::HandleData; TraverserT = checkpoint::Sizer; checkpoint::SerialSizeType = long unsigned int]'
2020-07-12T20:14:57.4115911Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch.impl.h:55:32:   required from 'Serializer& checkpoint::operator|(Serializer&, T&) [with Serializer = checkpoint::Sizer; T = vt::rdma::impl::HandleData]'
2020-07-12T20:14:57.4116823Z /vt/src/vt/collective/reduce/operators/default_msg.h:95:3:   required from 'void vt::collective::reduce::operators::ReduceDataMsg<DataType>::serialize(SerializeT&) [with SerializeT = checkpoint::Sizer; DataType = vt::rdma::impl::HandleData]'
2020-07-12T20:14:57.4117330Z /vt/src/vt/messaging/message/message_serialize.h:401:16:   [ skipping 10 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]
2020-07-12T20:14:57.4118644Z /vt/src/vt/collective/reduce/operators/default_op.impl.h:64:1:   required from 'static void vt::collective::reduce::operators::ReduceCombine<T>::msgHandler(MsgT*) [with MsgT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >; Op = vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>; ActOp = vt::collective::reduce::operators::ReduceCallback<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> > >; T = void]'
2020-07-12T20:14:57.4121075Z /vt/src/vt/objgroup/proxy/proxy_objgroup.h:134:21:   required by substitution of 'template<class OpT, class MsgPtrT, class MsgT, void (* f)(MsgT*)> void vt::objgroup::proxy::Proxy<vt::rdma::Manager>::reduce<OpT, MsgPtrT, MsgT, f>(MsgPtrT, vt::Callback<MsgT>, vt::objgroup::proxy::Proxy<vt::rdma::Manager>::ReduceStamp) const [with OpT = vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>; MsgPtrT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >*; MsgT = vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >; void (* f)(MsgT*) = vt::collective::reduce::operators::ReduceCombine<>::msgHandler<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> >, vt::collective::reduce::operators::PlusOp<vt::rdma::impl::HandleData>, vt::collective::reduce::operators::ReduceCallback<vt::rdma::impl::ConstructMsg<short unsigned int, (vt::rdma::HandleEnum)1, vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup> > > >]'
2020-07-12T20:14:57.4122335Z /vt/src/vt/rdmahandle/manager.impl.h:139:1:   required from 'vt::rdma::Handle<T, E> vt::rdma::Manager::makeHandleCollectiveObjGroup(ProxyT, std::size_t, bool) [with T = short unsigned int; vt::rdma::HandleEnum E = (vt::rdma::HandleEnum)1; ProxyT = vt::objgroup::proxy::Proxy<vt::tests::unit::TestObjGroup>; std::size_t = long unsigned int]'
2020-07-12T20:14:57.4123234Z /vt/src/vt/objgroup/proxy/proxy_objgroup.impl.h:169:128:   required from 'vt::rdma::Handle<T> vt::objgroup::proxy::Proxy<ObjT>::makeHandleRDMA(std::size_t, bool) const [with T = short unsigned int; ObjT = vt::tests::unit::TestObjGroup; std::size_t = long unsigned int]'
2020-07-12T20:14:57.4124138Z /vt/tests/unit/rdma/test_rdma_handle.cc:64:61:   required from 'vt::HandleRDMA<T> vt::tests::unit::TestObjGroup::makeHandle(std::size_t, bool) [with T = short unsigned int; vt::HandleRDMA<T> = vt::rdma::Handle<short unsigned int, (vt::rdma::HandleEnum)1, short int, void>; std::size_t = long unsigned int]'
2020-07-12T20:14:57.4124958Z /vt/tests/unit/rdma/test_rdma_handle.cc:245:29:   required from 'void vt::tests::unit::gtest_suite_TestRDMAHandle_::test_rdma_handle_5<gtest_TypeParam_>::TestBody() [with gtest_TypeParam_ = short unsigned int]'
2020-07-12T20:14:57.4125180Z /vt/src/vt/group/region/group_shallow_list.h:55:8:   required from here
2020-07-12T20:14:57.4126241Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:218:12: error: no matching function for call to 'checkpoint::dispatch::SerializerDispatchNonByte<checkpoint::Sizer, vt::rdma::impl::HandleData, checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData> >::applyStatic(checkpoint::Sizer&, vt::rdma::impl::HandleData*&, checkpoint::SerialSizeType&)'
2020-07-12T20:14:57.4126586Z      applyStatic(s, val, num);
2020-07-12T20:14:57.4126734Z      ~~~~~~~^~~~~~~~~~~~~
2020-07-12T20:14:57.4127059Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:163:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasInSerialize<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-12T20:14:57.4127344Z    void applyStatic(
2020-07-12T20:14:57.4127486Z  ^ ~~~~~~~~~
2020-07-12T20:14:57.4127681Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:163:1: note:   template argument deduction/substitution failed:
2020-07-12T20:14:57.4128047Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:177:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::hasNoninSerialize<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-12T20:14:57.4128324Z    void applyStatic(
2020-07-12T20:14:57.4128466Z  ^ ~~~~~~~~~
2020-07-12T20:14:57.4128651Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:177:1: note:   template argument deduction/substitution failed:
2020-07-12T20:14:57.4129050Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:190:1: note: candidate: template<class U> void checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::applyStatic(SerializerT&, T*, checkpoint::SerialSizeType, checkpoint::dispatch::SerializerDispatchNonByte<SerializerT, T, Dispatcher>::isEnum<U>*) [with U = U; SerializerT = checkpoint::Sizer; T = vt::rdma::impl::HandleData; Dispatcher = checkpoint::dispatch::BasicDispatcher<checkpoint::Sizer, vt::rdma::impl::HandleData>]
2020-07-12T20:14:57.4129355Z    void applyStatic(
2020-07-12T20:14:57.4129497Z  ^ ~~~~~~~~~
2020-07-12T20:14:57.4129679Z /build/checkpoint/install/include/checkpoint/dispatch/dispatch_serializer_nonbyte.h:190:1: note:   template argument deduction/substitution failed:
2020-07-12T20:15:26.2342461Z [251/441] Building CXX object tests/CMakeFiles/test_mpi_access_guards.dir/unit/main.cc.o

@pnstickne pnstickne force-pushed the 904-add-cuda-to-ci branch from f697bc0 to 8d741dd Compare July 28, 2020 07:45
* \brief Convenience using for when U is a vt::NodeType
*/
template <typename U>
using isIdx = typename std::enable_if_t<not std::is_same<U,vt::NodeType>::value>;
Copy link
Contributor

@pnstickne pnstickne Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D:

Seems unfortunate having to move this in line.

@lifflander lifflander force-pushed the 904-add-cuda-to-ci branch from 8d741dd to b5b9c5b Compare July 28, 2020 21:37
@PhilMiller
Copy link
Member

Even with this being "out-of-date", I'd suggest waiting for the CI actions to run to completion before pushing the rebase, so that the ccache bits all get digested first

pnstickne and others added 9 commits July 29, 2020 08:52
- Makes it easier to identify/change.
- NVCC is not able to infer these usages.

  10.1 infers more/better than 11.. the ping-pong example did not requires changes in 10.1.
- RdmaHandle is used in serialization. However it is neither a primitive,
  nor is it marked as byte-copyable.

  This type is appears to be forcing serialization in VT with NVCC,
  and the rules for 'is byte copyable' in VT might need to be unified
  back with checkpoint.
@lifflander lifflander force-pushed the 904-add-cuda-to-ci branch from 7a1e36b to ce9b48f Compare July 29, 2020 15:53
Copy link
Collaborator Author

Codacy Here is an overview of what got changed by this pull request:

Clones added
============
- src/vt/rdmahandle/handle.node.impl.h  16
- src/vt/rdmahandle/handle.index.impl.h  14
         

See the complete overview on Codacy

@lifflander lifflander merged commit 9ec8d37 into develop Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Nvidia/cuda compilers to CI
3 participants