Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
60297: colserde: precise slicing when deserializing arrow.Data r=yuzefovich a=yuzefovich When we are deserializing data from Arrow format, we have a long `[]byte` that contains several buffers within it (either 2 or 3, depending on the encoding format). When we have 3 buffers, the second one is used for offsets and the third one is the actual data. Previously, when slicing out the second buffer we would not cap it which would result in that buffer's capacity extending into the third buffer. This shouldn't create any issues from the perspective of GC (since the lifecycle of both buffers is the same), but it does trip up our memory accounting system when it is estimating the footprint of the vectors like Bytes that use 3 buffers because we're looking at the capacities of the underlying data. Effectively, we would be double-counting the third buffer. This is now fixed by capping the slice for each of the buffers. As a result, the memory estimate will likely become smaller after round-tripping through a converter and a serializer (in the original batch there might be extra capacity in the underlying slices that will no longer be present after deserialization). Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]>
- Loading branch information