-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
codec/number: decode number by access slice directly #3028
Conversation
src/util/codec/number.rs
Outdated
@@ -219,6 +222,112 @@ pub trait NumberDecoder: Read { | |||
|
|||
impl<T: Read> NumberDecoder for T {} | |||
|
|||
type Bytes<'a> = &'a [u8]; | |||
|
|||
const SIZE_OF_U64: usize = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use mem::size_of
instead.
src/util/codec/number.rs
Outdated
|
||
macro_rules! read_num_bytes { | ||
($size:expr, $data:expr, $fn:path) => {{ | ||
if $data.len() < $size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's false most of time, better check >=
first.
src/util/codec/number.rs
Outdated
}}; | ||
} | ||
/// `decode_i64` decodes value encoded by `encode_i64` before. | ||
fn decode_i64(data: &mut Bytes) -> Result<i64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add #[inline]
.
src/util/codec/number.rs
Outdated
/// `decode_var_u64` decodes value encoded by `encode_var_u64` before. | ||
pub fn decode_var_u64(data: &mut Bytes) -> Result<u64> { | ||
let (mut x, mut s, mut i) = (0, 0, 0); | ||
while i < data.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally the number is less than 516. So you can check if decode one byte is enough first.
src/util/codec/number.rs
Outdated
const SIZE_OF_U16: usize = 2; | ||
const SIZE_OF_F64: usize = 8; | ||
|
||
macro_rules! read_num_bytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do not need a macro, try
fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T>
where
F: Fn(&[u8]) -> T,
{
if data.len() < size {
return Err(Error::Io(io::Error::new(ErrorKind::UnexpectedEof, "eof")));
}
let buf = &data[0..size];
*data = &data[size..];
Ok(f(buf))
}
@@ -262,8 +262,8 @@ pub trait JsonDecoder: NumberDecoder { | |||
let mut value_entries_data = &data[key_entries_len..(key_entries_len + value_entries_len)]; | |||
let mut key_offset = key_entries_len + value_entries_len; | |||
for _ in 0..element_count { | |||
let key_real_offset = key_entries_data.decode_u32_le()?; | |||
let key_len = key_entries_data.decode_u16_le()?; | |||
let key_real_offset = number::decode_u32_le(&mut key_entries_data)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove all of the old key_entries_data.xxx implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will remove it step by step @ngaut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you create a issue to track that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have one in internal jira.
Could you post the benchmark results? |
src/util/codec/number.rs
Outdated
@@ -219,6 +222,128 @@ pub trait NumberDecoder: Read { | |||
|
|||
impl<T: Read> NumberDecoder for T {} | |||
|
|||
type Bytes<'a> = &'a [u8]; | |||
|
|||
fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add #[inline]
.
src/util/codec/number.rs
Outdated
fn decode_var_i64(data: &mut Bytes) -> Result<i64> { | ||
let v = decode_var_u64(data)?; | ||
let mut vx = v >> 1; | ||
if v & 1 != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check equal first, it's positive probably.
src/util/codec/number.rs
Outdated
} | ||
|
||
let (mut x, mut s, mut i) = (0, 0, 0); | ||
while i < data.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If data.len() >= 10
or last byte is less than 0x80, you can use a static loop for i in 0..9
and check the final byte instead.
It's not very convenient to post the benchmark results here since we haven't finished all in the current PR. And the benchmark results for number codec only was already posted in our internal jira(since the source code is not ready to be public) @ngaut |
src/util/codec/number.rs
Outdated
F: Fn(&[u8]) -> T, | ||
{ | ||
if data.len() >= size { | ||
let buf = &data[0..size]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant 0.
src/util/codec/number.rs
Outdated
/// `decode_var_u64` decodes value encoded by `encode_var_u64` before. | ||
#[inline] | ||
pub fn decode_var_u64(data: &mut Bytes) -> Result<u64> { | ||
if data.len() < 10 && data.iter().find(|&&x| x < 0x80).is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
@@ -11,7 +11,10 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
use byteorder::{BigEndian, LittleEndian, ReadBytesExt, WriteBytesExt}; | |||
// FIXME(shirly): remove following later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it will be removed after all Read trait in datum
been removed.
src/util/codec/number.rs
Outdated
type Bytes<'a> = &'a [u8]; | ||
|
||
#[inline] | ||
fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't think it's a good idea to provide post-transform function in this function. As the name indicates, it should read bytes, but in fact it will return other type of things. In addition, this post process is not hard / complex to implement in callers. Are there any reasons to merge them together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inspiration comes from https://docs.rs/byteorder/1.2.2/src/byteorder/lib.rs.html#1767.
And if we do not merge them together, the same code block would be duplicated again and again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think read_num_bytes(..).map(...)
is enough to substitute the f
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is ok to use f
here just like https://docs.rs/byteorder/1.2.2/src/byteorder/lib.rs.html#1767. If not we should consider changing the name read_num_bytes
src/util/codec/number.rs
Outdated
@@ -219,6 +222,150 @@ pub trait NumberDecoder: Read { | |||
|
|||
impl<T: Read> NumberDecoder for T {} | |||
|
|||
type Bytes<'a> = &'a [u8]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to define this type alias. The name Bytes
gives a false implication about whether it is a reference or a value. &[u8]
is much more cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this alias will be much easier and clear, otherwise, we need to define the lifecycle in every decoding function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about BytesSlice
?
Rest LGTM |
What's the benefit of this PR? |
Decode number by access slice directly will be faster than using |
Oh, I don't really know about that, why does using |
@huachaohuang Take a look at internal implementations and you will know. General speaking, there is a lot of branches again and again in each read. Since main operations are really simple, these branches contributes notably. |
LGTM |
/run-integration-tests |
friendly ping @BusyJay |
@BusyJay @breeswish PTAL