From bd3fab4333f9e95680f5ed0cd931455e323e6e27 Mon Sep 17 00:00:00 2001 From: Andrew Lamb Date: Wed, 13 Mar 2024 16:46:22 -0400 Subject: [PATCH] MINOR: [Docs] Clarify inlined strings in `VariableLengthStringView` is padded with `0` (#40512) ### Rationale for this change While implementing `Variable-size Binary View Layout` (thanks @ ariesdevil !) in https://github.com/apache/arrow-rs/pull/5481 it was not 100% clear if the inlined string was zero padded. @ bkietz noted that > The spec does say "padded with zero" https://github.com/apache/arrow/blob/main/docs/source/format/Columnar.rst?plain=1#L384 but it could be repeated in the surrounding paragraph. In any case, padded with zero is definitely the intent ``` * Short strings, length <= 12 | Bytes 0-3 | Bytes 4-15 | |------------|---------------------------------------| | length | data (padded with 0) | ``` ### What changes are included in this PR? Add a sentence in the surrounding text to make it clear the inlined strings values are zero padded Note I do not think this is a specification change (and therefore doesn't need a vote on the mailing list) as the spec already specifies the padding is zero (in the diagram). This simply clarifies the text to emphasize this point for ease of understanding ### Are these changes tested? ### Are there any user-facing changes? Authored-by: Andrew Lamb Signed-off-by: Sutou Kouhei --- docs/source/format/Columnar.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 7b74b972f2ab8..0cfece2586294 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -393,7 +393,8 @@ length of the string and can be used to determine how the rest of the view should be interpreted. In the short string case the string's bytes are inlined — stored inside the -view itself, in the twelve bytes which follow the length. +view itself, in the twelve bytes which follow the length. Any remaining bytes +after the string itself are padded with `0`. In the long string case, a buffer index indicates which data buffer stores the data bytes and an offset indicates where in that buffer the