-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent FFT Results Across Different Optimization Levels on ESP32 Platform #180
Comments
I can reproduce this, looks like a misoptimization with hardware floats. It seems like switching to soft floats (disabling floating point hardware acceleration) generates the correct response regardless of the opt-level: Do you think you could try and reduce the test case and try and pinpoint which float operations are going wrong? I can then take a further look with that knowledge in hand. |
Thank you! Alright, I'll try to identify the float operation or the combination of operations. |
Ok, I was able to isolate the case a bit more, however, I guess more questions than answers remain.
Output:
See also here: https://github.com/ramtej/esp32-compiler-marvels-std-rs/blob/1032788fbcbf3733c832a0e3af039c00ba79f802/src/main.rs#L73 The problem is when two complex f32 numbes are multiplied, but only if one of these numbers is in an array (or vector?) as seen in the example above. Another condition is that this vector must be referenced beforehand, e.g. with a println! or a compiler hint. Without this correct value is calculated. The delta is in the imaginary part of the complex number and differ only in sign. Thank you, |
I was able to narrow down the topic further:
Output
On the other hand, if I perform the same calculation without a struct, correct ones are calculated.
The struct + vec as well as 'println!' or the compiler hint lead to the issue. Summary
|
Is there anything else I can do? From my point of view, the issue is quite fundamental and I'm honestly surprised that it hasn't been discovered before. A regression test, fuzzer and/or quickcheck should have found it. What can I do? Thank you. |
I was going to submit an issue but this seems like the same problem as I've got. So I will give my example to reproduce the error:
Output: rustc 1.70.0 |
I can reproduce it. It's even more narrowed down, thx! https://rust.godbolt.org/ -> do we get further there, e.g. to see what the compiler actually generates? Any other ideas? |
It is not my specialization so won't be able to help further.
The neat thing about it is that we can "force" the correct sign by changing
which gives a very small assembly difference:
|
Hmm, are we about to invent new arithmetic? The problem is transitive, means it can occur in dependent crates. |
Can confirm I have this issue, and had multiple times in the past. Soft FP always fixes it, but I would like the performance, heh. In my (current) case, doing a final twiddle step in RDFT fn twiddle<const N: usize>(twiddles: &[C; N], left: &mut [C; N], right: &mut [C; N]) {
for ((&twiddle, left), right) in twiddles
.iter()
.zip(left.iter_mut())
.zip(right.iter_mut().rev())
{
let sum = *left + *right;
let diff = *left - *right;
let twiddled_re_sum = sum * twiddle.re;
let twiddled_im_sum = sum * twiddle.im;
let twiddled_re_diff = diff * twiddle.re;
let twiddled_im_diff = diff * twiddle.im;
let half_sum_re = 0.5 * sum.re;
let half_diff_im = 0.5 * diff.im;
let output_twiddled_real = twiddled_re_sum.im + twiddled_im_diff.re;
let output_twiddled_im = twiddled_im_sum.im - twiddled_re_diff.re;
*left = C {
re: half_sum_re + output_twiddled_real,
im: half_diff_im + output_twiddled_im,
};
*right = C {
re: half_sum_re - output_twiddled_real,
im: output_twiddled_im - half_diff_im,
};
}
} If I don't put logging in the middle, it will sometimes decide to send the values to the wrong places (swap the im values of left and right, for instance). This makes it apparent that this is a problem in basic arithmetic (addition -> subtraction), since in this case the calculations are anti-symmetric. |
@Kurielu @ramtej can you try to reproduce on https://github.com/esp-rs/rust-build/releases/tag/v1.71.0.0 ? I don't have any reason to suspect it got fixed by itself, but who knows. |
I can't check on a real esp32s3 right now, but from reading the compiled assembly for #[inline(never)]
fn math(x: f32, y: f32) -> f32 {
0.99 * y - x
} From what I can see it translates to
Read 0.99 into a8 So it should return x - y * 0.99 instead of 0.99 * y - x I'll check that the madd.s/msub.s instruction encoding of the registers isn't flipped in |
Still the same. #[inline(never)]
fn math(x: f32, y: f32) -> f32 {
0.99 * y - x
}
#[entry]
fn main() -> ! {
let a = 0.0;
let p = 0.99f32;
let v = math(a, p);
println!("{}", v);
loop {}
}
|
In my case it breaks regardless. The instruction works like fr - (fs * ft), it doesn't support (fx * fy) - fz, so it makes sense that it's used this way, but it seems like a negation step is missing. |
#[inline(never)]
fn math(x: f32, y: f32) -> f32 {
unsafe { core::intrinsics::fmaf32(0.99, y, -x) }
} Generates
Xor top bit of x to negate it, and write it to f8 |
@ramtej @MabezDev @Kurielu This code seems to have been copied with minor modifications from MIPS code, which doesn't have a fused multiply sub instruction, which means it handles only the FMA case. Working on a fix that I will submit as a PR to espressif/llvm-project shortly. |
Thanks a lot - you are a hero :-) |
Sorry folks, I've been on vacation. Thanks for looking into this and thank you @zRedShift for figuring it out a solution! I'll make sure we get that LLVM PR reviewed asap and cut a patch release of the rust compiler. |
I tried my FFT twiddling code from earlier with a newly built version of 1.71.0.1 with my branch of llvm (rust-esp), and it gives the correct results, so all good on that point. And I finally understand why the floating point values sometimes became correct when printing them (a heisenbug that eluded me for a while, since this miscompilation isn't new). The function |
The esp 1.72 release is now available as a pre-release, and includes @zRedShift's LLVM patches: https://github.com/esp-rs/rust-build/releases/tag/v1.72.0.0 |
Hello, I can confirm this and my application now also works with hardware fp - I can measure a performance increase of ~3.6x. Thanks again for the support. I will close this issue now. See you all at the next ticket/issue, lol. Regards, |
Description
I am comparing various Rust FFT implementations on both native (x86) and ESP32 (xtensa/esp32s3) platforms. The goal is to observe the performance and accuracy of the FFT libraries. I utilize 'std' Rust code for the embedded ESP platform via Rust bindings for ESP-IDF (Espressif's IoT Development Framework).
The following FFT libraries are currently used for testing:
Input and Expected Result
I generate a simple sinusoidal time series that's small enough to fit into the RAM of the ESP32 platform. This is then processed with a forward FFT operation, and a normalized amplitude of the spectrum is calculated. For x86 and ESP32, and for the different FFT libraries, the expected results should be as follows:
Actual Result
However, when the compiler optimization level is changed, I notice some inconsistencies in the results for the ESP32 platform (xtensa). This problem does not occur on the x86 platform.
For example, with opt-level = 0, MicroFFT produces an unexpected value at microfft.amplitudes[5] = 1, while with opt-level = 1, unexpected values are produced at various frequencies, as well as a frequency shift in RealFFT's peak. The same issue appears with opt-level = 3.
This issue needs to be addressed to ensure the FFT results are consistent across all platforms and compiler optimization levels.
Steps to Reproduce
I have already implemented the above steps in the following repository:
https://github.com/ramtej/esp32-compiler-marvels-std-rs
Thank you,
Jiri
The text was updated successfully, but these errors were encountered: