Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid 64bit division #291

Merged
merged 4 commits into from
Feb 26, 2022
Merged

Avoid 64bit division #291

merged 4 commits into from
Feb 26, 2022

Conversation

jannic
Copy link
Member

@jannic jannic commented Feb 10, 2022

While looking at the code generated by #288, I wondered why there were references to 64bit division code (compiler_builtins::int::specialized_div_rem::u64_div_rem) for a basically empty firmware.

Turned out that init_clocks_and_plls() contained 64bit integer divisions.

As the code to do those divisions takes about 1kB even in fully optimized builds, I changed the clock calculation code to use 32bit divisions, instead.

To keep the code small, I accepted some minimal additional rounding error: In case the remainder of the division is bigger than 2^24, the lower 8 bits will be thrown away, causing a relative rounding error of at most 2^-16.
(To be exact, instead of the expected rounding-down of normal integer division, the value might get rounded up, instead. The resulting difference from the true value might be even smaller than before.)

@9names
Copy link
Member

9names commented Feb 13, 2022

I tested this after rebasing onto the new main, since the new intrinsics code saves a decent chunk of code space too.

On a trivial project I tested on it's clearly a saving (240 bytes dev, 944 bytes release).
With a more complex project (defmt enabled, lots of printing, approx 22KB release and 27KB dev), it saved 288 bytes on dev build but adds 272 bytes in release (because it's no longer using the u64_div_rem that was already included due to math done elsewhere).

It's a trade-off. 272 bytes isn't a big chunk out of the 2MB on the Pico but it is something.
On the other hand, 944 bytes is a huge amount for a small firmware - my small project went from 5KB to 4KB - that might be a big deal if you were running entirely from RAM.

@jannic
Copy link
Member Author

jannic commented Feb 13, 2022

Those 272 are a real concern, as this change is all about saving a few bytes of flash.

How common is 64 bit division in embedded software? Having used ATmega before, even 32bit arithmetic sometimes feels like a luxury. 😄 If most real-world code ends up using it, the proposed optimization would be counterproductive.

In theory one could implement both approaches and let the user chose, but most users would not care at all, so having this choice would only be confusing.

Copy link
Member

@thejpster thejpster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some unit tests / doctests on fractional_div would be useful I think, to test all the corner cases.

@thejpster
Copy link
Member

I think I'm happy with this though. 64 bit maths is fairly uncommon so most people benefit.

I wonder though if we can use const fn to make this all go away though? I mean, the crystal frequency is always known and the sysclk is almost always known at compile time.

@jannic
Copy link
Member Author

jannic commented Feb 14, 2022

Some unit tests / doctests on fractional_div would be useful I think, to test all the corner cases.

I agree. I did some tests myself by pasting both the old and the new code into the same binary, throwing millions of random numbers at them, and comparing the results - and caught some subtle bugs in earlier versions of that patch.
Probably too much for a unit test, but having some tests is definitely a good idea.

I wonder though if we can use const fn to make this all go away though?

I thought the same yesterday, when I saw embassy-rs/embassy@640ddc9.
But init_clocks_and_plls()/configure_clock() have side effects and so can't be a const fn. And inside those functions, the frequency is no longer a constant known in advance.
Some API like this could work, but would be ugly:

  const fn calculate_divider(freq: u32, source_freq: u32) -> u32 { ... }
  
  const DIVIDER = calculate_divider( ... );
  [...]
  some_clock.configure( DIVIDER );

For init_clocks_and_plls one would need many dividers, so that would become even more ugly.
Perhaps the BSPs (where the crystal frequency can be assumed as constant) could provide some kind of init_clocks_and_plls_to_max_frequency() method which doesn't take any clock frequency parameters at all, so everything could be precalculated?

@thejpster
Copy link
Member

Latest changes look good

Copy link
Member

@9names 9names left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ready to merge to me.
Are you okay with this being merged @jannic or are you still thinking about updating it further?

@jannic
Copy link
Member Author

jannic commented Feb 26, 2022

It can be merged. I don't have actionable ideas on how to improve this further, at the moment.

@9names
Copy link
Member

9names commented Feb 26, 2022

Would you mind resolving the merge conflict on the changelog?

This saves about 1kB of flash by removing
compiler_builtins::int::specialized_div_rem::u64_div_rem if
no other code uses u64 divisions.
@jannic
Copy link
Member Author

jannic commented Feb 26, 2022

Ok, done

@thejpster
Copy link
Member

Looks like clippy has a sad

@jannic
Copy link
Member Author

jannic commented Feb 26, 2022

Looks like clippy has a sad

Yes - but not related to this pull request.
Fixed the clippy warnings in #304

@9names 9names merged commit 111654f into rp-rs:main Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants