Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the pointer alignment situation be improved? #23

Open
HadrienG2 opened this issue Sep 19, 2019 · 7 comments · May be fixed by #33
Open

Can the pointer alignment situation be improved? #23

HadrienG2 opened this issue Sep 19, 2019 · 7 comments · May be fixed by #33

Comments

@HadrienG2
Copy link

As the docs say, abomonation currently doesn't guarantee correct pointer alignment. This is pretty dangerous, even on x86 as rustc might be tempted to generate those evil SIMD instructions that assume the data is aligned and raise an exception otherwise someday.

I wonder if there is an API tweak we could use to improve upon this situation?

@frankmcsherry
Copy link
Member

My planned resolution to this, in the fullness of time, is to put each T type in a separate allocation that is properly aligned. This behaves a bit better with respect to allocation alignments and such, but means that de-abomonation draws from a set of allocations (which could be slices of the same common allocation, but aligning those is easier than per-record alignment logic).

But, I have been putting off re-engineering Abomonation for safety until I learned more about safety from the UB group (essentially: to learn if a safe version would be possible at all).

@frankmcsherry
Copy link
Member

We could probably consider alignment at each memcpy, and pre-pend some space before the memcpy if needed. Some care would be required to make sure that at deserialization we make the same decisions, which would probably mean making sure that the decode buffer is similarly aligned to the encode buffer. This is annoying when you are e.g. writing networking code and just get a buffer out from the kernel and would love to avoid a memcpy just to get the alignment better.

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 19, 2019

Regarding network data alignment, given that on modern CPUs memory protection is managed at page granularity, I would suspect that network buffers coming from the kernel might be page-aligned. If so, that's more than enough alignment, as long as things like protocol headers and non-abomonated network payloads don't get in the way. Those should have the same alignment as on the sender's side though, so even if they do get in the way it doesn't seem terribly hard to work around it.

But given that this page alignment theory is based on hardware implementation details that a Sufficiently Clever Kernel can work around (e.g. by slicing multiple buffers targeted at the same process from a single page), as a paranoid person, I'd probably use a Cow<[u8]>-like design so that abomonation can check that kernel memory buffers are indeed as aligned as expected, and resort to a malloc-memcpy pair if it turns out not to be the case.

@HadrienG2
Copy link
Author

HadrienG2 commented Sep 20, 2019

By the way, how much do you care about the serialization output being a Write implementation? It seems to me that this interface choice, as nice as it usually can be, gets a little in the way when one tries to resolve alignment matters. A lower-level &mut [u8]-based interface (which will undoubtedly be itself complexified by streaming matters, you can't assume that a buffer large enough to store the entire object is available) might greatly ease implementation.

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 14, 2019

This issue came back in my work recently, as I tried to figure out how to expose a safe interface to abomonation. So here is a tentative plan to resolve the problem without dropping the nice Write-based design for serialization (after all, we don't really need serialization output to be aligned):

  1. Treat the serialized output of abomonation as a C-style struct, where each memcpy or entomb output is treated as a struct field for alignment purposes.
  2. Have the Abomonation trait expose the required alignment of said struct for a given type T (which is basically the max of the alignment of T and the alignment of every data type which is recursively serialized by T::entomb()).
  3. Document in entomb()'s API contract that it should emit properly aligned output "struct fields" under the assumption that the output buffer was properly aligned before the memcpy, emitting padding bytes as necessary. Provide tools to make this easier, as alignment is easy to get wrong. Make our entomb() implementations follow the rules, and adjust extent() and exhume() implementations accordingly.
  4. Clarify in the trait's documentation which part of the code is responsible for emitting padding bytes between "struct fields" during encode()/entomb(), and which part of the code is responsible for discarding these padding bytes during decode()/exhume(). Adjust the implementation if necessary.
  5. Make decode return a Cow<AlignedT> instead of an &T, and document in decode()'s API contract that its input buffer should be aligned as specified by our newly added Abomonation::alignment() contract, or else a buffer reallocation + memcpy will need to happen. Inside of decode(), implement this via an alignment check.
  6. In debug builds, make exhume() also check, via a debug assertion, that its input buffer is properly aligned, as this can help greatly when testing and debugging Abomonation impls. This could be part of the aforementioned common tools to help Abomonation implementors get alignment right.

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 14, 2019

As a first step, it is also possible to skip the Cow part from the initial implementation, and just panic if the input buffer is badly aligned. But I would rather resolve this problem eventually, since the caller of a panicking-on-misalignment decode() would need to do something akin to what Cow does anyway.

(Also, we may actually need something more custom than litteral use of the standard Cow abstraction, because the aforementioned design needs over-sized and over-aligned allocations, and as far as I understand that may not be natively supported by Cow.)

@HadrienG2 HadrienG2 linked a pull request Oct 28, 2019 that will close this issue
@tarcieri
Copy link

FYI, an informational advisory (RUSTSEC-2021-0120) was filed against abomination relating to the unsoundness reported in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants