parse_integers_using_lower_nibble #127

dkuku · 2023-07-13T21:54:48Z

Currently integers are matched by pattern matching on number between 48 and 57 and then substracting 48 ascii code - that is integer 0
As 48 is 0x30 then we can just pattern match that the upper half of the byte is 3 and take the lower half - only checking if it's less than 10.
I prepared a livebook that compares both ways and there is 4-5% speed difference.

Comparison: 
half byte        3.04 K
original         2.93 K - 1.04x slower +13.23 μs

https://gist.github.com/dkuku/9004bbf92cb3b81e8ed6fe211b9c7fc6
I'm not sure how to measure the compilation time but it also should be a bit faster due to generating less code.

josevalim · 2023-07-13T22:28:13Z

This is very nice! Instead of adding a special combinator, could we perhaps have some heuristic? Then we could find more occasions to apply this automatically on top of ascii ranges?

dkuku · 2023-07-14T08:27:08Z

@josevalim
I've been thinking about it but I don't think it makes much sense - integer is a really special case where we encode 0x3y to just y.
The only place where it can be useful would be converting a string to octal or binary number - it won't even help when converting from hex representation to hex word because alphabet encoding starts with 0x41 and 0x61 - so the lower boundary still needs to be checked x >0 and x < 7; x = x + 9. which is the same as x>40 and x < 47; x = x - 31
But we don't have combinators for any of these yet. Having another example that makes sense would make easier to design this. Unless you have some ideas where it can be used ?

josevalim · 2023-07-14T10:38:34Z

The immediate heuristic I can think of is this:

If the distance between both numbers is < 16 and the rem(first, 16) == 0, then apply it
If the distance between both numbers is < 32 and the rem(first, 32) == 0, then apply it
If the distance between both numbers is < 64 and the rem(first, 64) == 0, then apply it
If the distance between both numbers is < 128 and the rem(first, 128) == 0, then apply it

It could be extended to 2, 4, and 8, but I am not sure if those will be worth it.

dkuku · 2023-07-14T21:32:17Z

I did some testing and it has no point for strings. With integers the benefit is that we have the integer already converted 'for free' and don't have to subtract ?0. This are different return values from the function comparing to what was returned from ascii(?0..?9).
With partial matching on strings there are also other problems. I'm matching on some of the bytes but I need to return everything that was given to the function so there are 2 variables needed: if lets say I want to match on the string "1" I need something like this:
defp function full = <<3::4, part::4>> when part < 1, do: full
or I have to reconstruct the actual variable
defp function <<3::4, part::4>> when part < 1, do: <<3::4, part::4>>
this makes more sense because binary patterns cannot be matched in parallel so I can't match on part of the byte and also on the full byte in one go.
It is also significantly slower:

Comparison: 
original         3.49 K
half_byte        2.37 K - 1.47x slower +134.91 μs

Memory usage statistics:

Name         Memory usage
original        493.70 KB
half_byte       704.64 KB - 1.43x memory usage +210.94 KB

josevalim · 2023-07-14T22:30:11Z

I see! Btw, which OTP version are you using for testing? 26 has more recent optimizations.

dkuku · 2023-07-15T06:01:07Z

Yes I'm using 26. But I just tried it on 25 and there my implementation is 3 x slower, when I run the same livebook. So it has no point for anyone with older otp version, So it has no point for anyone with older otp version. Maybe this should wait until elixir requires otp26?

josevalim · 2023-07-15T06:41:15Z

Yeah, given the low gain, we should probably hold this one for a while (and also confirm this will hold in future Erlang/OTP versions). :) Thank you!

parse_integers_using_lower_nibble

f307308

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse_integers_using_lower_nibble #127

parse_integers_using_lower_nibble #127

dkuku commented Jul 13, 2023

josevalim commented Jul 13, 2023

dkuku commented Jul 14, 2023

josevalim commented Jul 14, 2023

dkuku commented Jul 14, 2023

josevalim commented Jul 14, 2023

dkuku commented Jul 15, 2023

josevalim commented Jul 15, 2023

parse_integers_using_lower_nibble #127

Are you sure you want to change the base?

parse_integers_using_lower_nibble #127

Conversation

dkuku commented Jul 13, 2023

josevalim commented Jul 13, 2023

dkuku commented Jul 14, 2023

josevalim commented Jul 14, 2023

dkuku commented Jul 14, 2023

josevalim commented Jul 14, 2023

dkuku commented Jul 15, 2023

josevalim commented Jul 15, 2023