Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer from floating-point description #67

Open
oscbyspro opened this issue Aug 10, 2024 · 10 comments
Open

Integer from floating-point description #67

oscbyspro opened this issue Aug 10, 2024 · 10 comments
Labels
addition oh, so shiny! brrr such code, much wow

Comments

@oscbyspro
Copy link
Owner

oscbyspro commented Aug 10, 2024

I'm sure I'll need to losslessly parse arbitrary floating-point numbers at some point.

@oscbyspro oscbyspro added addition oh, so shiny! brrr such code, much wow labels Aug 10, 2024
@oscbyspro
Copy link
Owner Author

I'll work on this for a while. Maybe I'll fix Apple's JSON decoder while I'm at it.

@oscbyspro
Copy link
Owner Author

🪄 ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨ ✨

@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 13, 2024

So it turns out that you can clamp the exponent to the bounds of Int, if the maximum number of characters is Int.max. But this only works if you normalize the integer, fraction, and exponent parts in a specific way. Otherwise, the saturation may introduce lossy behavior near Int.max character inputs. You'll never actually parse inputs that big, but there's a proper way of doing it with signed same-size integers. I considered using sign-magnitude to get an extra bit of space, but then I thought of the normalization thing. Signed integer buffer sizes save the day again.

@oscbyspro
Copy link
Owner Author

But a sign-magnitude exponent is simpler in practice because applying the normalization bias correctly is nontrivial.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 15, 2024

I want to support JSON, JSON5 and whatever Swift is doing.

The JSON ones work, but I just realized that Swift parses this thing: {sign}0x{hex}.{hex}p{sign}{decimal}.

I'll totally parse that too, but mixing hex and decimal is rude :(

@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 15, 2024

Sign and magnitude exponent saturation does not work in the case of 0x{hex}.{hex}p{decimal} since the exponent part respresents a binary multiplier. You need more bits, basically. It might be simpler to just limit the input size to some silly-big-but-small-enough number or settle on a larger exponent type. Hm.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 17, 2024

So, I bascally have a lossless strtod-as-int now. I'll just have to add some more tests and clean it up a bit. Additionally, while I'm sure I'm the last one to figure this out by myself, you can skip a bunch of comparisons by lowercasing bytes as (x|32). It's quite neat and probably by design.

oscbyspro added a commit that referenced this issue Aug 18, 2024
It turns out that you can map uppercase letters to lowercase letters, and vice versa, with a bitwise operation (e.g. x | 32). It works because they are offset by a power of two relative to each other in ASCII. This trick turns 3 branches into 2, which makes hexadecimal decoding a few % faster. Note that the uppercase to lowercase transformation is simpler to represent in code, becaue it's easier to set a bit than it is to clear a bit using integer literals. I also reworked the numeral decoding and encoding test suite while I was at it.
@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 18, 2024

Hm. Theres a tricky edge case for integer sizes that aren't multiples of 4. Adding a hexadecimal digit might overflow a significand that is brought back into range by the binary exponent. Remember that strtod uses a hex-hex-bin format. I have banned silly integers for a reason in this project, but I suppose I might want to solve it regardless.

@oscbyspro
Copy link
Owner Author

I've been writing it in Swift proper with thoughts of porting it to this project, but the opposite seems wiser given that I have infrastructure I might want to evolve with the solution. Zzz. At least I found some edge cases that only applies to the former.

@oscbyspro
Copy link
Owner Author

oscbyspro commented Aug 21, 2024

I wonder if the exponent type in BinaryInteger/power(_:) (#53) should be UX. I think there's some merit in using Magnitude but the shift-by-1 approach doesn't scale well beyond register sizes and arbitrary integers cannot allocate beyond that. Hm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition oh, so shiny! brrr such code, much wow
Projects
None yet
Development

No branches or pull requests

1 participant