Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lent + tuple destructuring in VM iterators give random GC crashes (stack addresses escape) #17527

Closed
1 task done
timotheecour opened this issue Mar 26, 2021 · 4 comments · Fixed by #24119
Closed
1 task done
Labels

Comments

@timotheecour
Copy link
Member

timotheecour commented Mar 26, 2021

#17525 fails in CI, after investigation it fails with:

nim c tests/tmatching.nim hangs or gives SIGSEGV

which indicates a nim bug; and in fact it's a gc bug:

tests/tmatching.nim is a large file (2698 LOC), so i tried mimizing a bit to fusion/tests/tmatching6.nim (138 LOC), see https://gist.github.com/timotheecour/f8ce7fc19199df35722c15a916a96ab8 but the smaller the file the more rare the bug occurs when you run compile it, so to reproduce this reliably, use the original file tests/tmatching.nim; seemingly unrelated changes (eg adding comments or commenting out unused code) lead to reproducing the bug or not so it's hard to minimize.

to reproduce reliably:

  • checkout items(array) now returns lent T timotheecour/Nim#672 (has a single modification:
    iterator items*[IX, T](a: array[IX, T]): T {.inline.} = => iterator items*[IX, T](a: array[IX, T]): lent2 T {.inline.} =)
  • compile debug nim, eg:
    nim c -o:bin/nim_D20210326T125756 -d:useSysAssert -d:release --stacktrace compiler/nim
  • compile test:
    $nim_prs_D/bin/nim_D20210326T125756 c tests/tmatching.nim
  • this crashes:
[SYSASSERT] rawAlloc 8
Process 18945 exited with status = 1 (0x00000001)

stacktrace with lldb:

lldb -o "b exit" -o run -- $nim_prs_D/bin/nim_D20210326T125756 c tests/tmatching.nim

[SYSASSERT] rawAlloc 8
Process 22751 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00007fff6a5d03db libsystem_c.dylib`exit
libsystem_c.dylib`exit:
->  0x7fff6a5d03db <+0>: pushq  %rbp
    0x7fff6a5d03dc <+1>: movq   %rsp, %rbp
    0x7fff6a5d03df <+4>: pushq  %rbx
    0x7fff6a5d03e0 <+5>: pushq  %rax
Target 0: (nim_D20210326T125756) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  frame #0: 0x00007fff6a5d03db libsystem_c.dylib`exit
  frame #1: 0x000000010000de50 nim_D20210326T125756`rawAlloc_system_4603 + 3360
  frame #2: 0x00000001000170e0 nim_D20210326T125756`rawNewObj_system_5690 + 240
  frame #3: 0x0000000100016fa9 nim_D20210326T125756`newObj + 153
  frame #4: 0x00000001000bbe6e nim_D20210326T125756`newNodeIT_ast_1258 + 270
  frame #5: 0x000000010030b7f9 nim_D20210326T125756`getNullValue_vmgen_2348 + 937
  frame #6: 0x000000010030af09 nim_D20210326T125756`getNullValueAux_vmgen_2629 + 729
  frame #7: 0x000000010030b2a9 nim_D20210326T125756`getNullValueAux_vmgen_2629 + 1657
  frame #8: 0x000000010030b3a1 nim_D20210326T125756`getNullValueAux_vmgen_2629 + 1905
  frame #9: 0x000000010030b2a9 nim_D20210326T125756`getNullValueAux_vmgen_2629 + 1657
  frame #10: 0x000000010030b768 nim_D20210326T125756`getNullValue_vmgen_2348 + 792
  frame #11: 0x0000000100380962 nim_D20210326T125756`rawExecute_vm_768 + 128466
  frame #12: 0x000000010038e4c6 nim_D20210326T125756`evalMacroCall_vm_13677 + 6166
  frame #13: 0x00000001003c073f nim_D20210326T125756`semMacroExpr_sem_400 + 1327
  frame #14: 0x00000001003f52c7 nim_D20210326T125756`afterCallActions_sem_16160 + 407
...

when compiling nim with -g:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  frame #0: 0x00007fff6a5d03db libsystem_c.dylib`exit
    frame #1: 0x000000010000dde0 start-addr:0x000000010000ddaa  nim_D20210326T125756.2 rawAlloc_system_4603(a=<unavailable>, requestedSize=<unavailable>) + 3360
    frame #2: 0x0000000100017070 start-addr:0x0000000100017065  nim_D20210326T125756.2 rawNewObj_system_5690(typ=0x0000000100766a00, size=<unavailable>, gch=0x000000010074ba28) + 240 at  /Users/timothee/git_clone/nim/Nim_devel/lib/system/gc.nim:458:24
    frame #3: 0x0000000100016f39 start-addr:0x0000000100016f27  nim_D20210326T125756.2 newObj(typ=0x0000000100766a00, size=40) + 153 at  /Users/timothee/git_clone/nim/Nim_devel/lib/system/gc.nim:486:11
    frame #4: 0x00000001000ba9b2 start-addr:0x00000001000ba9a1  nim_D20210326T125756.2 newNode_ast_1204(kind='&') + 146 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/ast.nim:1163:174
    frame #5: 0x000000010035a0cc start-addr:0x000000010035a0c7  nim_D20210326T125756.2 copyValue_vm_369(src=0x000000010eb117d0) + 188 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim:193:11
    frame #6: 0x000000010035ab88 start-addr:0x000000010035ab83  nim_D20210326T125756.2 copyValue_vm_369(src=<unavailable>) + 2936 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim:210:59
    frame #7: 0x000000010035ab88 start-addr:0x000000010035ab83  nim_D20210326T125756.2 copyValue_vm_369(src=<unavailable>) + 2936 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim:210:59
    frame #8: 0x000000010036f8e1 start-addr:0x000000010036f8cb  nim_D20210326T125756.2 rawExecute_vm_768(c=0x0000000106bca210, start=<unavailable>, tos=0x000000010ebc1c90, Result=0x00007ffeefbf6ff0) + 58769 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim:1117:71
    frame #9: 0x000000010038e486 start-addr:0x000000010038e481  nim_D20210326T125756.2 evalMacroCall_vm_13677(module=0x000000010a2ed701, idgen=0x0000000106bca270, g=<unavailable>, templInstCounter=0x000000010e813210, n=0x000000010e8130d0, nOrig=0x0000000106bca210, sym=0x000000010d043590) + 6166 at  /Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim:2323:81

Additional Information

1.5.1 3e03f67
fusion 24f04ab8e38435156f8f885dea0a57c18b02626b

TODO

@timotheecour
Copy link
Member Author

adding -d:useGcAssert improves diagnostic:

nim c -o:bin/nim_D20210326T125756.3 -d:useSysAssert -d:useGcAssert -d:release --stacktrace --debugger:native --excessivestacktrace --lineTrace -g --skipusercfg -d:nimCompilerStackraceHints --stacktracemsgs compiler/nim

it fails with: [GCASSERT] incRef: interiorPtrTraceback (most recent call last)

proc incRef(c: PCell) {.inline.} =
  gcAssert(isAllocatedPtr(gch.region, c), "incRef: interiorPtr")
  c.refcount = c.refcount +% rcIncrement
  # and not colorMask
  logCell("incRef", c)

stacktrace: https://gist.github.com/timotheecour/f1e3a97dff99a02046d23e2bb13af133

@timotheecour timotheecour changed the title gc bug: sysAssert(c.freeList.zeroField == 0, "rawAlloc 8") fails gc or compiler bug: gcAssert(isAllocatedPtr(gch.region, c), "incRef: interiorPtr") fails Mar 27, 2021
@timotheecour
Copy link
Member Author

timotheecour commented Mar 27, 2021

@Araq after a very painful reduction, I finally have a repro with 0 dependencies and 0 changes to stdlib:

when true:
  #[
  ]#
  iterator items2*[IX, T](a: array[IX, T]): lent T {.inline.} =
    var i = low(IX)
    if i <= high(IX):
      while true:
        yield a[i]
        if i >= high(IX): break
        inc(i)

  proc main() =
    for i in 0..<1000:
      for (key, val) in items2([("any", "bar")]):
        debugEcho i
        debugEcho key

  static: main()

nim c --lib:lib -o:bin/nim_temp1 -d:useSysAssert -d:useGcAssert -d:release --stacktrace --debugger:native --excessivestacktrace --lineTrace -g --skipusercfg compiler/nim

$nim_prs_D/bin/nim_temp1 c -d:case10 --skipparentcfg --skipusercfg $timn_D/tests/nim/all/t12092.nim

...
any
22
any
23
[GCASSERT] incRef: interiorPtrTraceback (most recent call last)
/Users/timothee/git_clone/nim/Nim_prs/compiler/nim.nim(125) nim
/Users/timothee/git_clone/nim/Nim_prs/compiler/nim.nim(84) handleCmdLine
/Users/timothee/git_clone/nim/Nim_prs/compiler/main.nim(242) mainCommand
/Users/timothee/git_clone/nim/Nim_prs/compiler/main.nim(213) compileToBackend
/Users/timothee/git_clone/nim/Nim_prs/compiler/main.nim(90) commandCompileToC
/Users/timothee/git_clone/nim/Nim_prs/compiler/modules.nim(178) compileProject
/Users/timothee/git_clone/nim/Nim_prs/compiler/modules.nim(97) compileModule
/Users/timothee/git_clone/nim/Nim_prs/compiler/passes.nim(180) processModule
/Users/timothee/git_clone/nim/Nim_prs/compiler/passes.nim(73) processTopLevelStmt
/Users/timothee/git_clone/nim/Nim_prs/compiler/sem.nim(626) myProcess
/Users/timothee/git_clone/nim/Nim_prs/compiler/sem.nim(594) semStmtAndGenerateGenerics
/Users/timothee/git_clone/nim/Nim_prs/compiler/semstmts.nim(2274) semStmt
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(1042) semExprNoType
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(2892) semExpr
/Users/timothee/git_clone/nim/Nim_prs/compiler/semstmts.nim(2216) semStmtList
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(2819) semExpr
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(2340) semWhen
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(2892) semExpr
/Users/timothee/git_clone/nim/Nim_prs/compiler/semstmts.nim(2216) semStmtList
/Users/timothee/git_clone/nim/Nim_prs/compiler/semexprs.nim(2944) semExpr
/Users/timothee/git_clone/nim/Nim_prs/compiler/semstmts.nim(2172) semStaticStmt
/Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim(2218) evalStaticStmt
/Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim(2207) evalConstExprAux
/Users/timothee/git_clone/nim/Nim_prs/compiler/vm.nim(786) rawExecute
/Users/timothee/git_clone/nim/Nim_prs/lib/system/gc.nim(286) asgnRef
/Users/timothee/git_clone/nim/Nim_prs/lib/system/gc.nim(161) incRef

with -d:nimVMDebug: see https://gist.github.com/timotheecour/ade3904ff0c98a8652623dd94031dd25

debugging note

-d:nimVMDebug did not help with the original bug reduction from the initial complex exampl in fusion, as I had no idea where the bug occurred in user code and the bug happened somewhere inside a VM context, and the CT code ran for a very long time with -d:nimVMDebug (i stopped it after 10mn with no success), instead, this helped a ton:
adding:

echo "D20210326T193340: ", c.config$c.debug[pc]

right above when traceCode:

future PR should allow this (maybe even better, controlled by a if isDefined(c.config, "nimVMDebugLOC" so it can be controlled without recompiling nim, or even settable in a given context via {.define.} + {.undef.})

EDIT: done in #18244

@timotheecour timotheecour added the TODO: followup needed remove tag once fixed or tracked elsewhere label Mar 27, 2021
@timotheecour timotheecour changed the title gc or compiler bug: gcAssert(isAllocatedPtr(gch.region, c), "incRef: interiorPtr") fails lent + tuple destructuring in VM iterators give random GC crashes (stack addresses escape) Mar 27, 2021
@Araq Araq added the Severe label Mar 31, 2021
@timotheecour
Copy link
Member Author

update retrying snippet with

import std/vmutils
static: vmTrace(true)

now that #18244 was merged, it shows this user-code trace which is complementary to the compiler stacktrace:

...
/Users/timothee/git_clone/nim/Nim_devel/lib/system/iterators_1.nim(120, 11) [opcLtInt]   while i < b:
/Users/timothee/git_clone/nim/Nim_devel/lib/system/iterators_1.nim(120, 3) [opcFJmp]   while i < b:
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(19, 9) [opcAsgnInt]     for i in 0..<1000:
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(20, 12) [opcLdNull]       for (key, val) in items2([("any", "bar")]):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(20, 17) [opcLdNull]       for (key, val) in items2([("any", "bar")]):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(11, 16) [opcLdImmInt]     var i = low(IX)
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(12, 8) [opcConv]     if i <= high(IX):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(12, 17) [opcLdImmInt]     if i <= high(IX):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(12, 10) [opcLeInt]     if i <= high(IX):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(12, 10) [opcFJmp]     if i <= high(IX):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(20, 32) [opcLdNull]       for (key, val) in items2([("any", "bar")]):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(20, 32) [opcLdNullReg]       for (key, val) in items2([("any", "bar")]):
/Users/timothee/git_clone/nim/timn/tests/nim/all/t12555.nim(20, 33) [opcLdNull]       for (key, val) in items2([("any", "bar")]):
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

@metagn metagn added stale Staled PR/issues; remove the label after fixing them works_but_needs_test_case labels Jan 4, 2024
@metagn
Copy link
Collaborator

metagn commented Jan 4, 2024

The "minimal reduction" above:

when true:
  #[
  ]#
  iterator items2*[IX, T](a: array[IX, T]): lent T {.inline.} =
    var i = low(IX)
    if i <= high(IX):
      while true:
        yield a[i]
        if i >= high(IX): break
        inc(i)

  proc main() =
    for i in 0..<1000:
      for (key, val) in items2([("any", "bar")]):
        debugEcho i
        debugEcho key

  static: main()

by itself, compiles and runs and seems to give the correct output on 2.0, the actual output is very long because of for i in 0..<1000. I'm guessing it was fixed by #19902 since that fixed #14553 which seems very similar.

Removing followup needed label because of comment above:

future PR should allow this ... EDIT: done in #18244

@metagn metagn removed the TODO: followup needed remove tag once fixed or tracked elsewhere label Jan 4, 2024
Araq pushed a commit that referenced this issue Aug 30, 2024
closes #1969, closes #7547, closes #7737, closes #11838, closes #12283,
closes #12714, closes #12720, closes #14053, closes #16118, closes
#19670, closes #22645

I was going to wait on these but regression tests even for recent PRs
are turning out to be important in wide reaching PRs like #24010.

The other issues with the working label felt either finnicky (#7385,
#9156, #12732, #15247), excessive to test (#12405, #12424, #17527), or I
just don't know what fixed them/what the issue was (#16128: the PR link
gives a server error by Github, #12457, #12487).
metagn added a commit to metagn/Nim that referenced this issue Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants