Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nim crashes randomly #16613

Closed
yglukhov opened this issue Jan 6, 2021 · 13 comments · Fixed by #19902
Closed

Nim crashes randomly #16613

yglukhov opened this issue Jan 6, 2021 · 13 comments · Fixed by #19902
Assignees
Labels
Compiler Crash const `const x=expr` or `static: stmt` Converters VM see also `const` label

Comments

@yglukhov
Copy link
Member

yglukhov commented Jan 6, 2021

Steps to reproduce:

git clone https://github.com/yglukhov/iface
cd iface
git checkout c137f9330cc0bb59b1c9204fecd2358b5c156f52
while nim c tests/test1.nim; do true; done

Wait for 5 minutes
Actual result:
Nim eventually crashes with stack trace:

Traceback (most recent call last)
.../nim-devel/compiler/nim.nim(128) nim
.../nim-devel/compiler/nim.nim(87) handleCmdLine
.../nim-devel/compiler/main.nim(275) mainCommand
.../nim-devel/compiler/main.nim(245) compileToBackend
.../nim-devel/compiler/main.nim(101) commandCompileToC
.../nim-devel/compiler/modules.nim(178) compileProject
.../nim-devel/compiler/modules.nim(98) compileModule
.../nim-devel/compiler/passes.nim(180) processModule
.../nim-devel/compiler/passes.nim(73) processTopLevelStmt
.../nim-devel/compiler/sem.nim(644) myProcess
.../nim-devel/compiler/sem.nim(612) semStmtAndGenerateGenerics
.../nim-devel/compiler/semstmts.nim(2335) semStmt
.../nim-devel/compiler/semexprs.nim(1055) semExprNoType
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2852) semExpr
.../nim-devel/compiler/semexprs.nim(1038) semDirectOp
.../nim-devel/compiler/semexprs.nim(921) afterCallActions
.../nim-devel/compiler/semexprs.nim(39) semTemplateExpr
.../nim-devel/compiler/sem.nim(421) semAfterMacroCall
.../nim-devel/compiler/semstmts.nim(2335) semStmt
.../nim-devel/compiler/semexprs.nim(1055) semExprNoType
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2969) semExpr
.../nim-devel/compiler/semexprs.nim(2552) semBlock
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2978) semExpr
.../nim-devel/compiler/semstmts.nim(204) semTry
.../nim-devel/compiler/semstmts.nim(109) semExprBranchScope
.../nim-devel/compiler/semstmts.nim(102) semExprBranch
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2852) semExpr
.../nim-devel/compiler/semexprs.nim(1038) semDirectOp
.../nim-devel/compiler/semexprs.nim(921) afterCallActions
.../nim-devel/compiler/semexprs.nim(39) semTemplateExpr
.../nim-devel/compiler/sem.nim(421) semAfterMacroCall
.../nim-devel/compiler/semstmts.nim(2335) semStmt
.../nim-devel/compiler/semexprs.nim(1055) semExprNoType
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2948) semExpr
.../nim-devel/compiler/semstmts.nim(162) semIf
.../nim-devel/compiler/semstmts.nim(102) semExprBranch
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2978) semExpr
.../nim-devel/compiler/semstmts.nim(204) semTry
.../nim-devel/compiler/semstmts.nim(109) semExprBranchScope
.../nim-devel/compiler/semstmts.nim(102) semExprBranch
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2970) semExpr
.../nim-devel/compiler/semstmts.nim(2277) semStmtList
.../nim-devel/compiler/semexprs.nim(2974) semExpr
.../nim-devel/compiler/semstmts.nim(675) semConst
.../nim-devel/compiler/semexprs.nim(86) semExprWithType
.../nim-devel/compiler/semexprs.nim(70) semExprCheck
.../nim-devel/compiler/semexprs.nim(2857) semExpr
.../nim-devel/compiler/semexprs.nim(276) semConv
.../nim-devel/compiler/semexprs.nim(851) semStaticExpr
.../nim-devel/compiler/vm.nim(2220) evalStaticExpr
.../nim-devel/compiler/vm.nim(2212) evalConstExprAux
.../nim-devel/compiler/vm.nim(815) rawExecute
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Nim version: bbe05c1

@timotheecour
Copy link
Member

timotheecour commented Jul 7, 2021

@yglukhov can you:

  • always report nim version (git hash) in bug reports (in particular line numbers are not meaningful without a version)
  • minimize the example so it contains as few dependencies as possible, ideally only stdlib modules and preferably a single module without imports

to help narrow down, several flags can help:

  • building nim with --stacktrace --stacktraceMsgs -d:nimCompilerStacktraceHints so that you know where you are in user code in the stacktrace
  • instrumenting code in the place it crashes (maybe of opcWrDeref:) to show the line info in user code right before crash (info is c.debug[pc])
  • --processing:filenames
  • vmutils.vmTrace to trace VM execution in user code

EDIT: it's not impossible this is related to #17527, and that it'd be caused by a GC issue, can you read this comment #17527 (comment) and try with -d:useSysAssert -d:useGcAssert ?

@timotheecour timotheecour added VM see also `const` label Waiting on Author labels Jul 7, 2021
@yglukhov
Copy link
Member Author

@timotheecour were you unable to reproduce the crash?

@timotheecour
Copy link
Member

timotheecour commented Jul 29, 2021

doesn't change the stack trace of the crash

that's odd, it should; did you re-build nim with those flags (which is what's needed) or compile your program with those flags?

next step would be to minimize to remove any module dependency, since it's a compiler bug

@yglukhov
Copy link
Member Author

next step would be to minimize to remove any module dependency, since it's a compiler bug

Does it mean that you tried to reproduce it and failed?

@timotheecour
Copy link
Member

timotheecour commented Jul 29, 2021

with --stacktrace --stacktraceMsgs -d:nimCompilerStacktraceHints i get as of 7d3c3e0: https://gist.github.com/timotheecour/7f04794b8bc18a05e6e1047c6ae0b95f#file-gistfile1-txt-L2

which does show the stacktrace messages
you probably didn't compile nim with those flags and used those flags to compile the main program instead

(it crashed after running for 2 minutes)

Does it mean that you tried to reproduce it and failed?

it means it's hard to investigate without the bug minimized

@timotheecour
Copy link
Member

timotheecour commented Jul 29, 2021

TLDR: see workaround below, this would be fixed by nim-lang/RFCs#276

  • crashes in 1.0, 1.2, 1.4 also
  • doesn't compile in 0.19: iface.nim(63, 3) Error: undeclared identifier: 'signatureHash'
  • after instrumentation below it fails because n2 == nil:
        if regs[ra].nodeAddr == nil:
          stackTrace(c, tos, pc, "D20210729T134630")
        let n2 = regs[ra].nodeAddr[]
        if n2 == nil:
          stackTrace(c, tos, pc, "D20210729T134630.2")
        if n == nil:
          stackTrace(c, tos, pc, "D20210729T134630.3")
        if (nfIsRef notin regs[ra].nodeAddr[].flags and
            nfIsRef notin n.flags):
          regs[ra].nodeAddr[][] = n[]
  • vmutils.vmtrace shows it fails here:
/Users/timothee/git_clone/nim/temp/iface/tests/test1.nim(43, 23) [opcAsgnComplex]       let res = doSmth(d)
import iface, unittest
# import std/vmutils

iface Animal:
  proc say(): string
  proc test2()

type
  Dog = ref object of RootRef
    testCalled: bool

proc say(d: Dog): string = "bark"
proc test2(d: Dog) =
  d.testCalled = true

proc doSmth(a: Animal): string =
  a.test2()
  result = a.say()

proc createVoldemortObject(): Animal =
  type Hidden = ref object of RootRef
  proc test2(h: Hidden) = discard

suite "iface":
  test "runtime":
    let d = Dog.new()
    doAssert doSmth(d) == "bark"
    doAssert d.testCalled

  block:
    # static: vmTrace(true)
    when defined case1:
      const s = static:
        let d = Dog.new()
        let res = doSmth(d)
        doAssert(d.testCalled)
        0
    when defined case2:
      const s = (proc(): auto =
        let d = Dog.new()
        let res = doSmth(d)
        doAssert(d.testCalled)
        0)()

=> crashes with -d:case1
=> works with -d:case2, aka nim-lang/RFCs#276

    # const s = static:
    const s = (proc(): auto =
      let d = Dog.new()
      let res = doSmth(d)
      doAssert(d.testCalled)
      0)()

=> no crash

regardless of nim-lang/RFCs#276, this should be further investigated, ideally by finding a reduced example that exhibits the bug all the time instead of just once every 2 minutes

@timotheecour timotheecour added the const `const x=expr` or `static: stmt` label Jul 29, 2021
timotheecour added a commit to timotheecour/Nim that referenced this issue Jul 29, 2021
timotheecour added a commit to timotheecour/Nim that referenced this issue Jul 29, 2021
@yglukhov
Copy link
Member Author

yglukhov commented Aug 1, 2021

The crash happens, because PNode gets collected as no refs point to it. It is accessed though nodeAddr of registry. The value of the collected Node address is set here:

regs[ra].nodeAddr = addr src.sons[idx]

@timotheecour
Copy link
Member

The crash happens, because PNode gets collected as no refs point to it. It is accessed though nodeAddr of registry. The value of the collected Node address is set here:

possibly, but can you show a reduced test case (with no imports) to demonstrate this? Once we have a reduced test case, we can think about ways to fix this, possibly by adding gc'd references to all PNode's for which we've taken the address, storing in some seq[PNode]

@yglukhov
Copy link
Member Author

yglukhov commented Aug 1, 2021

possibly, but can you show a reduced test case (with no imports) to demonstrate this?

I don't have any, and I'm not sure it'll help, judging from the randomness of this issue. However, I've added some code to nim that proves my point and shows 100% reproducibility.

@timotheecour
Copy link
Member

timotheecour commented Aug 1, 2021

that's great to have a way to reproduce at 100%, I can indeed reproduce via #18628

However, we still need a minimized test case (now that we have a way to reproduce at 100% via that PR), to make it easier to understand the issue and fix it (even if it's indeed due to gc collecting nodeAddr)

A potential fix is by maintaining a seq[PNode] in the compiler as i suggested above, but it also raises the question of whether it'd prevent collecting things that should be collected; again, a minimized test case would help.

@yglukhov
Copy link
Member Author

yglukhov commented Aug 2, 2021

A potential fix is by maintaining a seq[PNode] in the compiler as i suggested above, but it also raises the question of whether it'd prevent collecting things that should be collected; again, a minimized test case would help.

I've been thinking of a similar approach, but not to use a seq[PNode], but just a gcHold: PNode field next to nodeAddr field, which should always be set along with nodeAddr to whatever is needed.

@yglukhov
Copy link
Member Author

yglukhov commented Aug 2, 2021

Ideally though it would be great to get rid of unsafe code (ptr that is) entirely. Not sure how feasible is that.

yglukhov added a commit to yglukhov/iface that referenced this issue Nov 30, 2021
@yglukhov
Copy link
Member Author

#19515 looks a lot like it could fix this issue, doesn't it?

Araq pushed a commit that referenced this issue Jun 22, 2022
 (#19902) [backport]

* revert #12217 since the root problem seems to have been fixed; fix #15974;fix #12551; fix #19464

* fix #16020; fix #16780

* fix tests and #16613

* fix #14553

* fix #19909; skip skipRegisterAddr

* fix #18641
narimiran pushed a commit that referenced this issue Jun 23, 2022
 (#19902) [backport]

* revert #12217 since the root problem seems to have been fixed; fix #15974;fix #12551; fix #19464

* fix #16020; fix #16780

* fix tests and #16613

* fix #14553

* fix #19909; skip skipRegisterAddr

* fix #18641

(cherry picked from commit 3cb2d7a)
capocasa pushed a commit to capocasa/Nim that referenced this issue Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compiler Crash const `const x=expr` or `static: stmt` Converters VM see also `const` label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants