-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid 64bit division #291
Avoid 64bit division #291
Conversation
I tested this after rebasing onto the new main, since the new intrinsics code saves a decent chunk of code space too. On a trivial project I tested on it's clearly a saving (240 bytes dev, 944 bytes release). It's a trade-off. 272 bytes isn't a big chunk out of the 2MB on the Pico but it is something. |
Those 272 are a real concern, as this change is all about saving a few bytes of flash. How common is 64 bit division in embedded software? Having used ATmega before, even 32bit arithmetic sometimes feels like a luxury. 😄 If most real-world code ends up using it, the proposed optimization would be counterproductive. In theory one could implement both approaches and let the user chose, but most users would not care at all, so having this choice would only be confusing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some unit tests / doctests on fractional_div would be useful I think, to test all the corner cases.
I think I'm happy with this though. 64 bit maths is fairly uncommon so most people benefit. I wonder though if we can use const fn to make this all go away though? I mean, the crystal frequency is always known and the sysclk is almost always known at compile time. |
I agree. I did some tests myself by pasting both the old and the new code into the same binary, throwing millions of random numbers at them, and comparing the results - and caught some subtle bugs in earlier versions of that patch.
I thought the same yesterday, when I saw embassy-rs/embassy@640ddc9.
For |
66d9b13
to
eb83c3d
Compare
Latest changes look good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ready to merge to me.
Are you okay with this being merged @jannic or are you still thinking about updating it further?
It can be merged. I don't have actionable ideas on how to improve this further, at the moment. |
Would you mind resolving the merge conflict on the changelog? |
This saves about 1kB of flash by removing compiler_builtins::int::specialized_div_rem::u64_div_rem if no other code uses u64 divisions.
eb83c3d
to
00b49d5
Compare
Ok, done |
Looks like clippy has a sad |
Yes - but not related to this pull request. |
While looking at the code generated by #288, I wondered why there were references to 64bit division code (
compiler_builtins::int::specialized_div_rem::u64_div_rem
) for a basically empty firmware.Turned out that
init_clocks_and_plls()
contained 64bit integer divisions.As the code to do those divisions takes about 1kB even in fully optimized builds, I changed the clock calculation code to use 32bit divisions, instead.
To keep the code small, I accepted some minimal additional rounding error: In case the remainder of the division is bigger than 2^24, the lower 8 bits will be thrown away, causing a relative rounding error of at most 2^-16.
(To be exact, instead of the expected rounding-down of normal integer division, the value might get rounded up, instead. The resulting difference from the true value might be even smaller than before.)