-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connect Bootrom Instrinsics #288
Conversation
Fix a few function signatures that don't match the ROM according to the datasheet. _memset4 also has a wrong code in the datasheet, so match it to the actual ROM.
intrinsics! { | ||
#[slower_than_default] | ||
#[bootrom_v2] | ||
#[alias = __eqsf2, __ltsf2, __nesf2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand those aliases. Aren't those different operations (equal, less than, not equal and less-or-equal?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the definition of the functions is things like "zero if strictly equal" or "negative if less" so the allow any other value if the condition isn't met. compiler-builtins does the same thing, which is actually where I got the logic from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fine, I think now I understand: The underlying function (bootrom function _fcmp
, softfloat library call __cmpsf2
) distinguishes three cases, <
, ==
, >
. The library also provides several alternatives which don't guarantee to distinguish all three cases, but just separate subsets. I guess they are separate in case the hardware allows for faster implementations if you are only interested in such a subset.
Thanks for those pointers!
rp2040-hal/src/intrinsics.rs
Outdated
/// entirely. | ||
/// * `bootrom_v2` - indicates that the override is only available | ||
/// on a V2 bootrom or higher. Currently this is ignored and the | ||
/// override is always used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens on chips with V1 bootrom?
If it causes a crash or undefined behavior, I'd say that this is the wrong default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I don't have a chip with a V1 bootrom to test. I suspect it would crash: the table resolve would probably return 0, so you'd probably get a hard fault exception (which would panic). However, I'm not actually sure how common V1 bootroms are (i.e. where they ever actually public?). The Pico SDK has logic that I'm not clear about: it only inserts the fallbacks if both V1 ROM and B0 chip support is enabled.
If they actually exist in the wild, then yeah I'd agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, V1 ROM is equivalent to B0 chip. (B0 has ROM v1, B1 has ROM v2, B2 has ROM v3)
I didn't find any hard numbers, but after some searching I'd say that the original RP PICO boards sold in February 2021 were based on RP2040 revision B0, so there are probably quite a few in the wild.
When I'm back home I'll check which revisions I have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, probably best to make non-default then, so updated with them gated by the feature rom-v2-intrinsics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a look, the first rp pico boards I bought really contained b0 stepping chips. From a local shop, no special early samples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I confirm the expected behavior, more or less:
Building a firmware with rom-v2-intrinsics
enabled, and flashing it to a RP2040 B0, causes a crash when executing a division.
What actually happens is not even a jump to address 0. Instead, the lookup for the soft_double_table
returns 0, and then the ddiv
method is looked up at an offset from that position. Incidently, the offset of the ddiv
method is 0x0c
, and the memory location 0x0c
happens to contain a pointer to the HardFault handler in ROM, which is just a loop { wfi() }
. Therefore, the debugger doesn't even show a crash.
I wonder if it would be better to somehow detect that situation and actively call panic!(), instead of just running into some random undefined behavior?
BTW, also tried on a RP2040 B1, works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well one idea might be to extend the ::ptr()
method of all ROM functions that require a V2 bootrom to check the bootrom version and panic if they're called at all on a V1. The check is pretty simple: just fetching the ROM at 0x13 and comparing the integer number. I'd probably just add an attribute for the ROM functions definitions and check it in a modified lookup body, and you could also add it to the general case for soft_double_table()
.
The version might be a good idea to expose in general as part of the ROM functions. Probably orthogonal to this PR, though, since it's a general problem with the ROM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I added a check for a V1 bootrom in the lookup for the floating point functions. So you should get a more sensible panic if you try and use them on an unsupported version, either explicitly or implicitly through an intrinsic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, works (panics) as expected:
INFO Program start
└─ rp2040_project_template::__cortex_m_rt_main @ src/main.rs:34
ERROR panicked at 'Double precision operations require V2 bootrom (found: V1)', /home/jan/rp2040/rp-rs/rp-hal/rp2040-hal/src/rom_data.rs:384:9
└─ panic_probe::print_defmt::print @ /home/jan/.cargo/registry/src/github.com-1ecc6299db9ec823/panic-probe-0.3.0/src/lib.rs:91
What's the correct way to make CI happy here? It looks like it's failing because it's compiling for a |
Add a check for a V2 bootrom when using floating point functions that require it. Panic when it's not present.
7d7cc99
to
375d9c0
Compare
I don't know much about those soft float routines (as obvious from my comment. So I can't really tell if the code is correct. One thing I'd add to the list of 'possible future work' is the handling of older bootrom versions. With the checks added one gets a proper panic message at runtime, which is fine. But it would be even better to get this panic message at the start of As an alternative, one could detect that case and fall back to the slower compiler-builtins at runtime. Doing that without a performance impact at every call would be tricky, and the resulting firmware would always be larger. So I'm not sure if it's worth the effort. But it would be a nice default setting: Works on every revision, and uses the faster ROM routines if available. Regarding the CI failure, I guess adding |
Good idea, I've added some thoughts on that. |
Yep, sounds fine to me.
|
Looks good to me, though I would feel more comfortable with it if you could turn all ROM intrinsics off with a feature so we could toggle it quickly to test for perf / correctness issues. |
Add a standardized macro for intrinsics export and connect the simple ROM functions to intrinsics.
Hook up the ROM functions to floating point intrinsics.
375d9c0
to
80c84b1
Compare
Updated with the intrinsic generation controlled by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Will raise an issue to discuss whether we should switch the subtractive feature (disable_intrinsics) with an additive one (enable_intrinsics) that is in default_features.
It is generally preferred to have 'features' that add features rather than remove features, but I feel that isn't sufficient cause to not merge this.
This connects the ROM functions to intrinsics, including both basic ones (e.x. memset) and the floating point library.
Floating Point Correctness
The floating point ROM functions do not handle or generate NaNs. In the Pico SDK, NaN correctness is enabled with
PICO_FLOAT_PROPAGATE_NANS
(on by default), which enables NaN correctness by checking arguments before handing off to the ROM functions. So this implements similar logic around the affected functions.Performance
Most of the ROM functions are faster than the default implementations in compiler-builtins. However, without ROM function caching (the
rom-func-cache
feature), the total performance is usually a bit of a wash (±2% for floating point operators). With ROM function caching, they're usually 10-20% faster. The exception is division, which is nearly twice as fast (and always a gain); probably due to efficient use of the hardware divider. For those intrinsics that are not at least as good, they are decorated with the non-standard attribute#[slower_than_default]
. Currently, this just makes the intrinsics macro not actually declare the intrinsic; but in the future this could be made optional (e.x. to save flash or because the ROM functions are always a gain on a XIP cache miss).Additionally, without NaN handling they gain 5-10% in general. So that would make many of them always a gain.
Floating Point Functions
There are intrinsics defined for floating point functions (sin, cos, exp, etc), not just the operators. However, these are not currently accessible. This is because
#![no_std]
currently disables things likef32::sin()
(see rust-lang/rfcs#2505). The current workaround to that is just to use libm functions directly, so that can just be substituted for using the ROM functions directly. Many of these are significantly slower than the libm implementations anyway (probably due to ROM space constraints), so it's not a huge loss.The intrinsic names mirror compiler-builtins own definitions for them, so hopefully when core picks up math support, they will be used automatically.
Bootrom Versions
The double precision functions and some of the single precision conversion/atan2 are only available on V2 or greater of the bootrom. The intrinsics that use thse are decorated with the non-standard attribute
#[bootrom_v2]
.The macro currently just always enables these. My logic here is that this is basically the same as how we were already exposing them in the ROM function definitions anyway. However, it might make sense to make this optional.These are only enabled when the featurerom-v2-intrinsics
is set. This also adds a check for the bootrom version when using any of the V2 only functions (explicitly or implicitly (setting the feature and using a V1 bootrom)), and it will now panic in a sensible way instead of producing undefined behavior when one is used.The Pico SDK handles this case with fairly complicated trampoline/SVC exception scheme to automatically insert software code. I figured this was well out of scope for this PR.
Possible Future Work
This PR will work on its own but there's some things that could be improved. I could certainly make some progress on these, but I wanted to get the basics out first.
#[slower_than_default]
intrinsics controllable. The argument here is that using them saves flash space, and they're still faster in the event of an XIP cache miss. So for large programs with only occasional floating point use, it could make sense.MakeControlled by the#[bootrom_v2]
intrinsics controllable.rom-v2-intrinsics
feature.PICO_FLOAT_PROPAGATE_NANS
. Most of the time "non-finite" is sufficient, and the ROM functions do propagate that. This would represent a speedup of all floating point operations.rom-v2-intrinsics
set and using an unsupported operation on a V1 chip). It would be better to panic sooner (i.e. on#[entry]
), but I'm not sure how you'd detect if an unsupported operation was used at all. Maybe you could do something with having every V2 function reference some linker symbol and look for if it's been GCed during linking (and panic if it hasn't), but I'm not really sure how you'd actually test for that. The even better option would be to do what the Pico SDK does and fall back to flash implementations. This would probably be easier to implement in the detection sense: just provide alternatives to the V2 functions and have their::ptr()
implementations return the alternative on a V1 bootrom. The hard part here is providing the alternative implementations: you can't (?) reference compiler-builtins as an actual crate dependency, because doing so makes it be linked before rp-hal and that messes up the magic that's used to allow for intrinsic overrides.Closes #237