-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"fat"
LTO Miscompilation
#146
Comments
Likely related to a scheduling issue I've had a while back. Basically, the MPMC queue used for the scheduler is implemented as a ringbuffer using atomic operations, and that ringbuffer has 8 bit (i.e. 256) indices. Initially, @ivmarkov solved it by using the Since My issue occurred regardless of LTO mode, and someone (I can't remember who) tested it on the esp32c3 where it worked fine. Also note, that this issue even occurred in wokwi for me, so unlikely to be a hardware bug. |
@N3xed Do you know where the original discussion for the MPMC issue exists? I can't seem to find it across the various esp-rs repos. Might be helpful in producing a smaller example, especially if it was happening without LTO. |
Unfortunately, I've never created an issue for this, so this is the only place as far as I know. When I discovered this issue I worked around it, went on to other stuff, and never got around to it until @zRedShift brought it up in the matrix chat. There is my discussion about it with @ivmarkov in the matrix chat though. |
@MabezDev @N3xed @ivmarkov #![no_std]
#![no_main]
use esp_backtrace as _;
#[xtensa_lx_rt::entry]
fn main() -> ! {
let queue = heapless::mpmc::MpMcQueue::<(), 8>::new();
let mut counter = 0;
loop {
esp_println::println!("counter at {counter}");
queue.enqueue(()).unwrap();
queue.dequeue().unwrap();
counter += 1;
}
} Please look for more at: https://github.com/zRedShift/esp-spinlock-repro |
@MabezDev this is getting to the limit of what I can do, but compiling the following branch: https://github.com/zRedShift/esp-spinlock-repro/blob/without-heapless/src/main.rs (without thin LTO **************************************************************
* FUNCTION *
**************************************************************
uint __stdcall inner(uint param_1)
uint o2:4 <RETURN>
uint i2:4 param_1
_ZN18esp_spinlock_repro5inner17h372c0f243cfa6d XREF[3]: 420003b4(*),
esp_spinlock_repro::esp_spinlock_repro::inner __xtensa_lx_rt_main_trampoline:4
.debug_frame::00000d8c(*)
420036f4 36 81 00 entry a1,0x40
420036f7 81 1f f3 l32r a8,DAT_42000374 = 07060504h
420036fa 89 31 s32i.n a8,a1,0xc
420036fc 81 1f f3 l32r a8,DAT_42000378 = 03020100h
420036ff 89 21 s32i.n a8,a1,0x8
42003701 0c 08 movi.n a8,0x0
42003703 82 51 08 s16i a8,a1,0x10
42003706 0c 78 movi.n a8,0x7
42003708 89 11 s32i.n a8,a1,0x4
4200370a 0c 39 movi.n a9,0x3
4200370c a2 a0 ff movi a10,0xff
4200370f 7c fb movi.n a11,0xff
42003711 46 03 00 j LAB_42003722
LAB_42003714 XREF[1]: 420037ea(j)
42003714 82 c1 08 addi a8,a1,0x8
42003717 d0 88 20 or a8,a8,a13
4200371a c0 20 00 memw
4200371d 8b cc addi.n a12,a12,0x8
4200371f c2 48 00 s8i a12,a8,0x0
LAB_42003722 XREF[2]: 42003711(j), 42003805(j)
42003722 d2 01 11 l8ui a13,a1,0x11
42003725 88 11 l32i.n a8,a1,0x4
42003727 80 ed 10 and a14,a13,a8
4200372a 1b cd addi.n a12,a13,0x1
LAB_4200372c XREF[1]: 4200377d(j)
4200372c f2 c1 08 addi a15,a1,0x8
4200372f e0 ff 20 or a15,a15,a14
42003732 f2 0f 00 l8ui a15,a15,0x0
42003735 c0 20 00 memw
42003738 d7 1f 02 beq a15,a13,LAB_4200373e
4200373b 46 2f 00 j LAB_420037fc
LAB_4200373e XREF[1]: 42003738(j)
4200373e f2 c1 08 addi a15,a1,0x8
42003741 9b ff addi.n a15,a15,0x9
42003743 f0 69 10 and a6,a9,a15
42003746 60 7f c0 sub a7,a15,a6
42003749 d0 f6 11 slli a15,a6,0x3
4200374c 00 1f 40 ssl a15
4200374f 00 6a a1 sll a6,a10
42003752 b0 66 30 xor a6,a6,a11
42003755 58 07 l32i.n a5,a7,0x0
42003757 60 25 10 and a2,a5,a6
4200375a 00 5d a1 sll a5,a13
4200375d 00 4c a1 sll a4,a12
LAB_42003760 XREF[1]: 42003771(j)
42003760 8d 02 mov.n a8,a2
42003762 80 25 20 or a2,a5,a8
42003765 80 34 20 or a3,a4,a8
42003768 20 0c 13 wsr a2,SCOMPARE1
4200376b 32 e7 00 s32c1i a3,a7,0x0
4200376e 60 23 10 and a2,a3,a6
42003771 87 92 eb bne a2,a8,LAB_42003760
42003774 00 0f 40 ssr a15
42003777 30 80 91 srl a8,a3
4200377a a0 88 10 and a8,a8,a10
4200377d d7 98 ab bne a8,a13,LAB_4200372c
42003780 82 c1 08 addi a8,a1,0x8
42003783 e0 88 20 or a8,a8,a14
42003786 c0 20 00 memw
42003789 c2 48 00 s8i a12,a8,0x0
LAB_4200378c XREF[1]: 420037f6(j)
4200378c c2 01 10 l8ui a12,a1,0x10
4200378f 88 11 l32i.n a8,a1,0x4
42003791 80 dc 10 and a13,a12,a8
42003794 1b ec addi.n a14,a12,0x1
LAB_42003796 XREF[1]: 420037e7(j)
42003796 a0 8e 10 and a8,a14,a10
42003799 f2 c1 08 addi a15,a1,0x8
4200379c d0 ff 20 or a15,a15,a13
4200379f 62 0f 00 l8ui a6,a15,0x0
420037a2 c0 20 00 memw
420037a5 87 96 44 bne a6,a8,LAB_420037ed
420037a8 82 c1 08 addi a8,a1,0x8
420037ab 8b 88 addi.n a8,a8,0x8
420037ad 80 f9 10 and a15,a9,a8
420037b0 f0 78 c0 sub a7,a8,a15
420037b3 d0 ff 11 slli a15,a15,0x3
420037b6 00 1f 40 ssl a15
420037b9 00 8a a1 sll a8,a10
420037bc b0 58 30 xor a5,a8,a11
420037bf 88 07 l32i.n a8,a7,0x0
420037c1 50 28 10 and a2,a8,a5
420037c4 00 4c a1 sll a4,a12
420037c7 00 66 a1 sll a6,a6
LAB_420037ca XREF[1]: 420037db(j)
420037ca 8d 02 mov.n a8,a2
420037cc 80 24 20 or a2,a4,a8
420037cf 80 36 20 or a3,a6,a8
420037d2 20 0c 13 wsr a2,SCOMPARE1
420037d5 32 e7 00 s32c1i a3,a7,0x0
420037d8 50 23 10 and a2,a3,a5
420037db 87 92 eb bne a2,a8,LAB_420037ca
420037de 00 0f 40 ssr a15
420037e1 30 80 91 srl a8,a3
420037e4 a0 88 10 and a8,a8,a10
420037e7 c7 98 ab bne a8,a12,LAB_42003796
420037ea 86 c9 ff j LAB_42003714
LAB_420037ed XREF[1]: 420037a5(j)
420037ed e0 86 c0 sub a8,a6,a14
420037f0 80 88 01 slli a8,a8,0x18
420037f3 80 88 31 srai a8,a8,0x18
420037f6 d6 28 f9 bgez a8,LAB_4200378c
420037f9 c6 02 00 j LAB_42003808
LAB_420037fc XREF[1]: 4200373b(j)
420037fc d0 8f c0 sub a8,a15,a13
420037ff 80 88 01 slli a8,a8,0x18
42003802 80 88 31 srai a8,a8,0x18
42003805 d6 98 f1 bgez a8,LAB_42003722
LAB_42003808 XREF[1]: 420037f9(j)
42003808 1d f0 retw.n
4200380a 00 ?? 00h
4200380b 00 ?? 00h fat LTO **************************************************************
* esp_spinlock_repro::inner *
**************************************************************
uint __stdcall inner(uint param_1)
uint o2:4 <RETURN>
uint i2:4 param_1
_ZN18esp_spinlock_repro5inner17h4a0b3ef61100c1 XREF[3]: 42000068(*), main:42000681(c),
esp_spinlock_repro::inner
.debug_frame::0000001c(*)
4200045c 36 81 00 entry a1,0x40
4200045f 81 f0 fe l32r a8,_text_start = 07060504h
42000462 89 31 s32i.n a8,a1,0xc
42000464 81 f0 fe l32r a8,DAT_42000024 = 03020100h
42000467 89 21 s32i.n a8,a1,0x8
42000469 0c 08 movi.n a8,0x0
4200046b 82 51 08 s16i a8,a1,0x10
4200046e 0c 78 movi.n a8,0x7
42000470 89 11 s32i.n a8,a1,0x4
42000472 0c 39 movi.n a9,0x3
42000474 a2 a0 ff movi a10,0xff
42000477 7c fb movi.n a11,0xff
42000479 46 03 00 j LAB_4200048a
LAB_4200047c XREF[1]: 42000552(j)
4200047c 82 c1 08 addi a8,a1,0x8
4200047f d0 88 20 or a8,a8,a13
42000482 c0 20 00 memw
42000485 8b cc addi.n a12,a12,0x8
42000487 c2 48 00 s8i a12,a8,0x0
LAB_4200048a XREF[2]: 42000479(j), 4200056d(j)
4200048a d2 01 11 l8ui a13,a1,0x11
4200048d 88 11 l32i.n a8,a1,0x4
4200048f 80 ed 10 and a14,a13,a8
42000492 1b cd addi.n a12,a13,0x1
LAB_42000494 XREF[1]: 420004e5(j)
42000494 f2 c1 08 addi a15,a1,0x8
42000497 e0 ff 20 or a15,a15,a14
4200049a f2 0f 00 l8ui a15,a15,0x0
4200049d c0 20 00 memw
420004a0 d7 1f 02 beq a15,a13,LAB_420004a6
420004a3 46 2f 00 j LAB_42000564
LAB_420004a6 XREF[1]: 420004a0(j)
420004a6 f2 c1 08 addi a15,a1,0x8
420004a9 9b ff addi.n a15,a15,0x9
420004ab f0 69 10 and a6,a9,a15
420004ae 60 7f c0 sub a7,a15,a6
420004b1 d0 f6 11 slli a15,a6,0x3
420004b4 00 1f 40 ssl a15
420004b7 00 6a a1 sll a6,a10
420004ba b0 66 30 xor a6,a6,a11
420004bd 58 07 l32i.n a5,a7,0x0
420004bf 60 25 10 and a2,a5,a6
420004c2 00 5d a1 sll a5,a13
420004c5 00 4c a1 sll a4,a12
LAB_420004c8 XREF[1]: 420004d9(j)
420004c8 8d 02 mov.n a8,a2
420004ca 80 25 20 or a2,a5,a8
420004cd 80 34 20 or a3,a4,a8
420004d0 20 0c 13 wsr a2,SCOMPARE1
420004d3 32 e7 00 s32c1i a3,a7,0x0
420004d6 60 23 10 and a2,a3,a6
420004d9 87 92 eb bne a2,a8,LAB_420004c8
420004dc 00 0f 40 ssr a15
420004df 30 80 91 srl a8,a3
420004e2 a0 88 10 and a8,a8,a10
420004e5 d7 98 ab bne a8,a13,LAB_42000494
420004e8 82 c1 08 addi a8,a1,0x8
420004eb e0 88 20 or a8,a8,a14
420004ee c0 20 00 memw
420004f1 c2 48 00 s8i a12,a8,0x0
LAB_420004f4 XREF[1]: 4200055e(j)
420004f4 c2 01 10 l8ui a12,a1,0x10
420004f7 88 11 l32i.n a8,a1,0x4
420004f9 80 dc 10 and a13,a12,a8
420004fc 1b ec addi.n a14,a12,0x1
LAB_420004fe XREF[1]: 4200054f(j)
420004fe a0 8e 10 and a8,a14,a10
42000501 f2 c1 08 addi a15,a1,0x8
42000504 d0 ff 20 or a15,a15,a13
42000507 f2 0f 00 l8ui a15,a15,0x0
4200050a c0 20 00 memw
4200050d 87 9f 44 bne a15,a8,LAB_42000555
42000510 82 c1 08 addi a8,a1,0x8
42000513 8b 88 addi.n a8,a8,0x8
42000515 80 f9 10 and a15,a9,a8
42000518 f0 78 c0 sub a7,a8,a15
4200051b d0 ff 11 slli a15,a15,0x3
4200051e 00 1f 40 ssl a15
42000521 00 8a a1 sll a8,a10
42000524 b0 68 30 xor a6,a8,a11
42000527 88 07 l32i.n a8,a7,0x0
42000529 60 28 10 and a2,a8,a6
4200052c 00 5c a1 sll a5,a12
4200052f 00 4e a1 sll a4,a14
LAB_42000532 XREF[1]: 42000543(j)
42000532 8d 02 mov.n a8,a2
42000534 80 25 20 or a2,a5,a8
42000537 80 34 20 or a3,a4,a8
4200053a 20 0c 13 wsr a2,SCOMPARE1
4200053d 32 e7 00 s32c1i a3,a7,0x0
42000540 60 23 10 and a2,a3,a6
42000543 87 92 eb bne a2,a8,LAB_42000532
42000546 00 0f 40 ssr a15
42000549 30 80 91 srl a8,a3
4200054c a0 88 10 and a8,a8,a10
4200054f c7 98 ab bne a8,a12,LAB_420004fe
42000552 86 c9 ff j LAB_4200047c
LAB_42000555 XREF[1]: 4200050d(j)
42000555 e0 8f c0 sub a8,a15,a14
42000558 80 88 01 slli a8,a8,0x18
4200055b 80 88 31 srai a8,a8,0x18
4200055e d6 28 f9 bgez a8,LAB_420004f4
42000561 c6 02 00 j LAB_42000570
LAB_42000564 XREF[1]: 420004a3(j)
42000564 d0 8f c0 sub a8,a15,a13
42000567 80 88 01 slli a8,a8,0x18
4200056a 80 88 31 srai a8,a8,0x18
4200056d d6 98 f1 bgez a8,LAB_4200048a
LAB_42000570 XREF[1]: 42000561(j)
42000570 1d f0 retw.n
42000572 00 ?? 00h
42000573 00 ?? 00h Besides label, cross-references and debug-info, the only difference is the register allocation, which after equalizing the labels comes down to this diff: 79c79
< 4200379f 62 0f 00 l8ui a6,a15,0x0
---
> 4200379f f2 0f 00 l8ui a15,a15,0x0
81c81
< 420037a5 87 96 44 bne a6,a8,LAB_420037ed
---
> 420037a5 87 9f 44 bne a15,a8,LAB_420037ed
89c89
< 420037bc b0 58 30 xor a5,a8,a11
---
> 420037bc b0 68 30 xor a6,a8,a11
91,93c91,93
< 420037c1 50 28 10 and a2,a8,a5
< 420037c4 00 4c a1 sll a4,a12
< 420037c7 00 66 a1 sll a6,a6
---
> 420037c1 60 28 10 and a2,a8,a6
> 420037c4 00 5c a1 sll a5,a12
> 420037c7 00 4e a1 sll a4,a14
96,97c96,97
< 420037cc 80 24 20 or a2,a4,a8
< 420037cf 80 36 20 or a3,a6,a8
---
> 420037cc 80 25 20 or a2,a5,a8
> 420037cf 80 34 20 or a3,a4,a8
100c100
< 420037d8 50 23 10 and a2,a3,a5
---
> 420037d8 60 23 10 and a2,a3,a6
108c108
< 420037ed e0 86 c0 sub a8,a6,a14
---
> 420037ed e0 8f c0 sub a8,a15,a14 |
Narrowed it down to |
I ported your code to esp-rs/esp-hal@main...MabezDev:esp-hal:esp-spinlock-repro and retried (all examples are compiled with lto='fat'). This bug seems to be resolved. |
I tried this code (repo: esp-spinlock-repro):
When I'm running code I've compiled with "fat" LTO, on 1.64.0.0 or 1.65.0.0, with the
xtensa-esp32s3-espidf
target on my ESP32S3, regardless of theesp-idf
branch, my code hangs indefinitely after 256 iteratiots:If I disable LTO or move to "thin" LTO, the code keeps chugging along with no end in sight (what I expect to see).
Meta
rustc --version --verbose
:The text was updated successfully, but these errors were encountered: