codec/number: decode number by access slice directly #3028

AndreMouche · 2018-05-04T12:54:55Z

@BusyJay @breeswish PTAL

BusyJay · 2018-05-04T12:57:12Z

src/util/codec/number.rs

@@ -219,6 +222,112 @@ pub trait NumberDecoder: Read {

 impl<T: Read> NumberDecoder for T {}

+type Bytes<'a> = &'a [u8];
+
+const SIZE_OF_U64: usize = 8;


Use mem::size_of instead.

BusyJay · 2018-05-04T12:59:00Z

src/util/codec/number.rs

+
+macro_rules! read_num_bytes {
+    ($size:expr, $data:expr, $fn:path) => {{
+        if $data.len() < $size {


It's false most of time, better check >= first.

BusyJay · 2018-05-04T12:59:33Z

src/util/codec/number.rs

+    }};
+}
+/// `decode_i64` decodes value encoded by `encode_i64` before.
+fn decode_i64(data: &mut Bytes) -> Result<i64> {


Add #[inline].

BusyJay · 2018-05-04T13:01:52Z

src/util/codec/number.rs

+/// `decode_var_u64` decodes value encoded by `encode_var_u64` before.
+pub fn decode_var_u64(data: &mut Bytes) -> Result<u64> {
+    let (mut x, mut s, mut i) = (0, 0, 0);
+    while i < data.len() {


Generally the number is less than 516. So you can check if decode one byte is enough first.

overvenus · 2018-05-05T03:26:08Z

src/util/codec/number.rs

+const SIZE_OF_U16: usize = 2;
+const SIZE_OF_F64: usize = 8;
+
+macro_rules! read_num_bytes {


Looks like we do not need a macro, try

fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T> where F: Fn(&[u8]) -> T, { if data.len() < size { return Err(Error::Io(io::Error::new(ErrorKind::UnexpectedEof, "eof"))); } let buf = &data[0..size]; *data = &data[size..]; Ok(f(buf)) }

ngaut · 2018-05-07T05:39:58Z

src/coprocessor/codec/mysql/json/binary.rs

@@ -262,8 +262,8 @@ pub trait JsonDecoder: NumberDecoder {
        let mut value_entries_data = &data[key_entries_len..(key_entries_len + value_entries_len)];
        let mut key_offset = key_entries_len + value_entries_len;
        for _ in 0..element_count {
-            let key_real_offset = key_entries_data.decode_u32_le()?;
-            let key_len = key_entries_data.decode_u16_le()?;
+            let key_real_offset = number::decode_u32_le(&mut key_entries_data)?;


Can we remove all of the old key_entries_data.xxx implementations?

Yes, we will remove it step by step @ngaut

Could you create a issue to track that?

We already have one in internal jira.

ngaut · 2018-05-07T06:16:29Z

Could you post the benchmark results?

BusyJay · 2018-05-07T05:50:36Z

src/util/codec/number.rs

@@ -219,6 +222,128 @@ pub trait NumberDecoder: Read {

 impl<T: Read> NumberDecoder for T {}

+type Bytes<'a> = &'a [u8];
+
+fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T>


Add #[inline].

BusyJay · 2018-05-07T05:55:10Z

src/util/codec/number.rs

+fn decode_var_i64(data: &mut Bytes) -> Result<i64> {
+    let v = decode_var_u64(data)?;
+    let mut vx = v >> 1;
+    if v & 1 != 0 {


Let's check equal first, it's positive probably.

BusyJay · 2018-05-07T07:23:18Z

src/util/codec/number.rs

+    }
+
+    let (mut x, mut s, mut i) = (0, 0, 0);
+    while i < data.len() {


If data.len() >= 10 or last byte is less than 0x80, you can use a static loop for i in 0..9 and check the final byte instead.

AndreMouche · 2018-05-07T08:14:44Z

It's not very convenient to post the benchmark results here since we haven't finished all in the current PR. And the benchmark results for number codec only was already posted in our internal jira(since the source code is not ready to be public) @ngaut

BusyJay · 2018-05-07T11:56:58Z

src/util/codec/number.rs

+    F: Fn(&[u8]) -> T,
+{
+    if data.len() >= size {
+        let buf = &data[0..size];


Redundant 0.

BusyJay · 2018-05-07T12:01:06Z

src/util/codec/number.rs

+/// `decode_var_u64` decodes value encoded by `encode_var_u64` before.
+#[inline]
+pub fn decode_var_u64(data: &mut Bytes) -> Result<u64> {
+    if data.len() < 10 && data.iter().find(|&&x| x < 0x80).is_none() {


What's this?

breezewish · 2018-05-07T15:47:57Z

src/util/codec/number.rs

@@ -11,7 +11,10 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-use byteorder::{BigEndian, LittleEndian, ReadBytesExt, WriteBytesExt};
+// FIXME(shirly): remove following later


Can this be removed now?

no, it will be removed after all Read trait in datum been removed.

breezewish · 2018-05-08T07:06:16Z

src/util/codec/number.rs

+type Bytes<'a> = &'a [u8];
+
+#[inline]
+fn read_num_bytes<T, F>(size: usize, data: &mut &[u8], f: F) -> Result<T>


Honestly I don't think it's a good idea to provide post-transform function in this function. As the name indicates, it should read bytes, but in fact it will return other type of things. In addition, this post process is not hard / complex to implement in callers. Are there any reasons to merge them together?

The inspiration comes from https://docs.rs/byteorder/1.2.2/src/byteorder/lib.rs.html#1767.
And if we do not merge them together, the same code block would be duplicated again and again.

I think read_num_bytes(..).map(...) is enough to substitute the f here?

I think is ok to use f here just like https://docs.rs/byteorder/1.2.2/src/byteorder/lib.rs.html#1767. If not we should consider changing the name read_num_bytes

breezewish · 2018-05-08T07:12:44Z

src/util/codec/number.rs

@@ -219,6 +222,150 @@ pub trait NumberDecoder: Read {

 impl<T: Read> NumberDecoder for T {}

+type Bytes<'a> = &'a [u8];


I prefer not to define this type alias. The name Bytes gives a false implication about whether it is a reference or a value. &[u8] is much more cleaner.

I think this alias will be much easier and clear, otherwise, we need to define the lifecycle in every decoding function.

How about BytesSlice?

breezewish · 2018-05-08T09:22:53Z

Rest LGTM

huachaohuang · 2018-05-09T09:29:28Z

What's the benefit of this PR?

AndreMouche · 2018-05-09T10:26:09Z

Decode number by access slice directly will be faster than using Read trait before @huachaohuang

huachaohuang · 2018-05-10T01:53:20Z

Oh, I don't really know about that, why does using Read trait hurt the performance?

breezewish · 2018-05-10T05:26:47Z

@huachaohuang Take a look at internal implementations and you will know. General speaking, there is a lot of branches again and again in each read. Since main operations are really simple, these branches contributes notably.

breezewish · 2018-05-10T05:27:00Z

LGTM

AndreMouche · 2018-05-10T06:20:49Z

/run-integration-tests

AndreMouche · 2018-05-10T06:21:12Z

friendly ping @BusyJay

codec/number: decode number without Read trait

edda340

BusyJay reviewed May 4, 2018

View reviewed changes

overvenus reviewed May 5, 2018

View reviewed changes

AndreMouche added 3 commits May 7, 2018 10:16

codec/number: address comments

efcce17

mocktikv/mvcc: make fmt

3e88a34

codec/number: fix bug

c9ec4c7

ngaut reviewed May 7, 2018

View reviewed changes

BusyJay reviewed May 7, 2018

View reviewed changes

Merge branch 'master' into number_decode

4a5d4f9

address comments

45205b2

BusyJay reviewed May 7, 2018

View reviewed changes

breezewish reviewed May 7, 2018

View reviewed changes

AndreMouche added 7 commits May 8, 2018 10:29

address comments

9339e7e

Merge branch 'master' into number_decode

988a6da

address comments

0265965

address comments

98cfe64

Merge branch 'master' into number_decode

ad96f55

address comments

8043112

merge master

95f8914

breezewish reviewed May 8, 2018

View reviewed changes

Merge branch 'master' into number_decode

2b8fdea

AndreMouche added 2 commits May 10, 2018 10:07

codec/number: rename Bytes to BytesSlice

8af467b

codec/number: mv BytesSlice from number to mod

2813a32

codec/number: make all decode function for number public

45dac18

Merge branch 'master' into number_decode

c0df0ad

AndreMouche mentioned this pull request May 10, 2018

coprocessor/codec: remove read trait from datum and table in decode #3049

Closed

Merge branch 'master' into number_decode

dc36de8

BusyJay approved these changes May 10, 2018

View reviewed changes

AndreMouche merged commit 1d8cf7c into tikv:master May 11, 2018

sticnarf pushed a commit to sticnarf/tikv that referenced this pull request Oct 27, 2019

codec/number: decode number by access slice directly (tikv#3028)

44bfad7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codec/number: decode number by access slice directly #3028

codec/number: decode number by access slice directly #3028

AndreMouche commented May 4, 2018

BusyJay May 4, 2018

BusyJay May 4, 2018

BusyJay May 4, 2018

BusyJay May 4, 2018

overvenus May 5, 2018

ngaut May 7, 2018

AndreMouche May 7, 2018

ngaut May 7, 2018

AndreMouche May 7, 2018

ngaut commented May 7, 2018

BusyJay May 7, 2018

BusyJay May 7, 2018

BusyJay May 7, 2018

AndreMouche commented May 7, 2018

BusyJay May 7, 2018

BusyJay May 7, 2018

breezewish May 7, 2018

AndreMouche May 8, 2018

breezewish May 8, 2018

AndreMouche May 8, 2018

breezewish May 8, 2018

AndreMouche May 10, 2018

breezewish May 8, 2018

AndreMouche May 8, 2018

breezewish May 8, 2018 •

edited

Loading

breezewish commented May 8, 2018

huachaohuang commented May 9, 2018

AndreMouche commented May 9, 2018

huachaohuang commented May 10, 2018

breezewish commented May 10, 2018 •

edited

Loading

breezewish commented May 10, 2018

AndreMouche commented May 10, 2018

AndreMouche commented May 10, 2018

		@@ -219,6 +222,150 @@ pub trait NumberDecoder: Read {

		impl<T: Read> NumberDecoder for T {}

		type Bytes<'a> = &'a [u8];

codec/number: decode number by access slice directly #3028

codec/number: decode number by access slice directly #3028

Conversation

AndreMouche commented May 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ngaut commented May 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndreMouche commented May 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breezewish May 8, 2018 • edited Loading

Choose a reason for hiding this comment

breezewish commented May 8, 2018

huachaohuang commented May 9, 2018

AndreMouche commented May 9, 2018

huachaohuang commented May 10, 2018

breezewish commented May 10, 2018 • edited Loading

breezewish commented May 10, 2018

AndreMouche commented May 10, 2018

AndreMouche commented May 10, 2018

breezewish May 8, 2018 •

edited

Loading

breezewish commented May 10, 2018 •

edited

Loading