Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protocol buffer序列化方式 #93

Open
wanghaisheng opened this issue Jul 10, 2015 · 3 comments
Open

protocol buffer序列化方式 #93

wanghaisheng opened this issue Jul 10, 2015 · 3 comments

Comments

@wanghaisheng
Copy link
Owner Author

@wanghaisheng
Copy link
Owner Author

疑问点

Packed Repeated Fields

Version 2.1.0 introduced packed repeated fields, which are declared like repeated fields but with the special [packed=true] option. These function like repeated fields, but are encoded differently. A packed repeated field containing zero elements does not appear in the encoded message. Otherwise, all of the elements of the field are packed into a single key-value pair with wire type 2 (length-delimited). Each element is encoded the same way it would be normally, except without a tag preceding it.
For example, imagine you have the message type:
message Test4 {
repeated int32 d = 4 [packed=true];
}
Now let's say you construct a Test4, providing the values 3, 270, and 86942 for the repeated field d. Then, the encoded form would be:
22 // tag (field number 4, wire type 2)
06 // payload size (6 bytes)
03 // first element (varint 3)
8E 02 // second element (varint 270)
9E A7 05 // third element (varint 86942)
Only repeated fields of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) can be declared "packed".
Note that although there's usually no reason to encode more than one key-value pair for a packed repeated field, encoders must be prepared to accept multiple key-value pairs. In this case, the payloads should be concatenated. Each pair must contain a whole number of elements.

int32是4字节来表示的,知道字节数为6 怎么知道是三个值的
或者这里的例子中其实是在说他们三个分别是表示成这样,并不是一个连在一起的串

86942 的二进制编码是
000 0101
010 0111
001 1110
可以看出要用3个字节来表示,由于PB中least significant group first,故颠倒一下顺序,
001 1110
010 0111
000 0101
但是欠缺一位MSB,怎么赋值呢,0还是1取决于后续是否还有字符,
如下
1 001 1110
1 010 0111
0 000 0101
对应的hex表示也就是
9E
A7
05

@wanghaisheng
Copy link
Owner Author

考察消息结构之前,让我首先要介绍一个叫做 Varint 的术语。

Varint 是一种紧凑的表示数字的方法。它用一个或多个字节来表示一个数字,值越小的数字使用越少的字节数。这能减少用来表示数字的字节数。

比如对于 int32 类型的数字,一般需要 4 个 byte 来表示。但是采用 Varint,对于很小的 int32 类型的数字,则可以用 1 个 byte 来表示。当然凡事都有好的也有不好的一面,采用 Varint 表示法,大的数字则需要 5 个 byte 来表示。从统计的角度来说,一般不会所有的消息中的数字都是大数,因此大多数情况下,采用 Varint 后,可以用更少的字节数来表示数字信息。下面就详细介绍一下 Varint。

Varint 中的每个 byte 的最高位 bit 有特殊的含义,如果该位为 1,表示后续的 byte 也是该数字的一部分,如果该位为 0,则结束。其他的 7 个 bit 都用来表示数字。因此小于 128 的数字都可以用一个 byte 表示。大于 128 的数字,比如 300,会用两个字节来表示:1010 1100 0000 0010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant