Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization for Kokkos containers residing in host-inaccessible memory space with limited cost #196

Open
PhilMiller opened this issue Apr 23, 2021 · 2 comments

Comments

@PhilMiller
Copy link
Member

Application codes are moving away from allocating data in memory that's accessible to both host and device code to avoid performance overhead and pitfalls. We still need to be able to serialize instances of device-space containers for checkpoint/restart and messaging/communication.

This will be a concern for any view that doesn't satisfy this predicate:

template <typename ViewType>
constexpr bool isHostAccessible(const ViewType &v) {
  return SpaceAccessibility<HostSpace,ViewType::memory_space>::accessible;
}

The most expedient implementation would be to use auto host_view = Kokkos::create_mirror_view(view_to_serialize); with appropriate copies (or create_mirror_view_and_copy). The problem with this is that it may allocate a lot of memory, and move a lot of data synchronously all at once if view_to_serialize is large. We may need to do that as a stop-gap measure anyway, to guarantee functionality.

A more thorough implementation would create and copy through limited-size bounce buffers in host memory to limit added memory footprint. To ensure good performance, it would use a streaming approach with an exec_space argument to Kokkos::deep_copy, so that parts of the view's contents can be serialized while other parts are being copied.

@PhilMiller
Copy link
Member Author

Serialization to a buffer inherently implies the footprint of the active data being doubled for the serialization buffer. Doing better than that requires some form of direct transfer (e.g. RDMA, GPUDirect MPI, etc).

Any approach based on mirror view construction implies a footprint up to triple the active data.

An incremental / streaming bounce buffer can offer 2x+delta footprint, where delta is constant, but may still have to be substantial to obtain good performance.

Since we effectively assume that the underlying type is byte-copyable through use of deep_copy, we could arrange to construct a View<T***, HostSpace, MemoryUnmanaged> of the right size and full contiguity pointing directly at the desired spot in the serialization buffer, and deep_copy directly to/from that. That may be ideal, if there's nothing stopping us from making it work.

@PhilMiller
Copy link
Member Author

Jonathan mentioned an increment-offset-and-get-pointer method on the serializer that should be good for the in-memory buffer use case. We'll pass it the size of the View contents to be serialized, and subtract that off to form the pointer that will be the base of the unmanaged host view which will be the target of the deep_copy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant