-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize (vectorize?) integer deserialization #25
Comments
My thoughts on parsing integers start here: https://stackoverflow.com/a/74453028 |
@aqrit, looked at your posts (and mentioned posts/articles) quite a bit. I've changed the function to detect invalid characters too. This is a first stab at the problem, I've not benchmarked it. For simdzone, we know what type of integer (8 bit, 16 bit or 32 bit) we're going to parse, so chances are we can do much better. static int parse_integer(const char *buffer, size_t length, uint64_t *number)
{
uint64_t digits, bad_chars;
static const uint8_t shift[] =
{ 0, 56, 48, 40, 32, 24, 16, 8, 0, 0, 0, 0, 0, 0, 0, 0 };
length &= 0xfull;
memcpy(&digits, buffer, sizeof(digits)); // assumes little-endian
// flip 0x30 (ASCII 0 (0x30) - 9 (0x39)) bits, detect non-digits
digits ^= 0x3030303030303030ull;
// shift off non-digits
digits <<= shift[length];
bad_chars = (digits | (digits + 0x0606060606060606ull)) & 0xf0f0f0f0f0f0f0f0ull;
digits = (digits & 0x0f0f0f0f0f0f0f0fllu) * 2561 >> 8;
digits = (digits & 0x00ff00ff00ff00ffllu) * 6553601 >> 16;
digits = (digits & 0x0000ffff0000ffffllu) * 42949672960001 >> 32;
*number = digits;
return (bad_chars == 0);
}
int main(int argc, char *argv[])
{
char buffer[16] = { 0 }; // padding guaranteed in simdzone
size_t length;
if (argc == 2)
length = strlen(argv[1]);
else
length = 0;
memcpy(buffer, argv[1], length&0xfull);
uint64_t number;
if (parse_integer(buffer, length, &number)) {
fprintf(stdout, "number: %llu\n", number);
return 0;
} else {
fprintf(stderr, "Invalid number\n");
return -1;
}
} |
I found that with validation required, it's incredibly hard to beat the naive implementation. I focused on static int parse_int8_swar(const char *str, size_t len, uint8_t *num)
{
union { uint8_t as_str[4]; uint32_t as_int; } digits;
memcpy(&digits.as_int, str, sizeof(digits));
// flip 0x30, detect non-digits
digits.as_int ^= 0x30303030lu;
// shift off trash bytes
digits.as_int <<= (4 - (len & 0x3)) * 8;
const uint32_t x = digits.as_str[1] * 10 + digits.as_str[2];
const uint32_t y = x * 10 + digits.as_str[3];
*num = (uint8_t)y;
return (digits.as_int & 0xf0f0f0f0) == 0 && y < 256 && len && len < 4;
} Where naive is: __attribute__((noinline))
static int parse_int8_naive(const char *str, size_t len, uint8_t *num)
{
uint32_t n = 0;
for (size_t i=0, r=len&0x3; i < r; i++) {
uint8_t d = str[i] - '0';
if (d > 9)
return 0;
n = n * 10 + d;
}
*num = (uint8_t)n;
return n < 256 && len && len < 4;
} Don't know if it's worth the trouble. @aqrit, @lemire, any pointers? |
Here is my attempt: https://lemire.me/blog/2023/11/28/parsing-8-bit-integers-quickly/ |
(My tests suggest that you can definitively do much better than naive with SWAR.) |
Great stuff @lemire. Thanks! |
@k0ekk0ek I am tweaking it a bit since the initial version did not fully validate the input. |
I forgot to add |
I recommend the following... int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) {
if(len == 0 || len > 3) { return 0; }
union { uint8_t as_str[4]; uint32_t as_int; } digits;
memcpy(&digits.as_int, str, sizeof(digits));
digits.as_int ^= 0x30303030lu;
digits.as_int <<= ((4 - len) * 8);
uint32_t all_digits = ((digits.as_int & 0xf0f0f0f0) |
((0x76767676 + digits.as_int) & 0x80808080)) == 0;
*num = (uint8_t)((UINT64_C(0x640a0100) * digits.as_int) >> 32);
return all_digits & ((__builtin_bswap32(digits.as_int) <= 0x020505));
} I think it covers everything. |
Slightly more efficient... int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) {
if(len == 0 || len > 3) { return 0; }
union {
uint8_t as_str[4];
uint32_t as_int;
} digits;
memcpy(&digits.as_int, str, sizeof(digits));
// flip 0x30, detect non-digits
digits.as_int ^= 0x30303030lu;
// shift off trash bytes, technically undefined behaviour when ((4 - len) * 8) is not
// in [0,32) prior to C17 / C++14.
digits.as_int <<= ((4 - len) * 8);
// check all digits
uint32_t all_digits = ((digits.as_int | (0x06060606 + digits.as_int)) & 0xF0F0F0F0) == 0;
*num = (uint8_t)((0x640a01 * digits.as_int) >> 24);
return all_digits & ((__builtin_bswap32(digits.as_int) <= 0x020505));
} This is now squarely 2x faster than naive on random inputs, and still measurably faster on predictable inputs. It is likely practical. |
Thanks @lemire! It's amazing how much time one can spend on this stuff. I have been trying to workaround the shift (note that
I'll give this some more thought at a later stage. |
Or, maybe it's just better solved in SIMD? Trying to get things to go "fast" for the fallback parser may not be the right choice(?) From a practical perspective, users won't notice as no-one is using CPU's that don't support SIMD for real work. Anyway, I'll have to polish the current code for the first release, I'll work on this more afterwards. (I have some vacation coming up 🙂) |
Note that this is true of the naïve version as well. |
Random blog post I came across. May be worth investigating at some point. |
Currently, converting numbers to binary is done the idiomatic way. Since integers (int8, int16 and int32) occur quite a lot in zone files, e.g. TTL, optimization may result in a nice peformance boost.
To tackle this, it's good to know that for some integer values, use of a symbolic value is allowed to. e.g. algorithm field in
DS
records. Also, TTL values are 32-bit integer fields, but the MSB must not be set.The text was updated successfully, but these errors were encountered: