Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential C FFI regression #19342

Closed
johnnovak opened this issue Jan 8, 2022 · 18 comments · Fixed by #19385 or #19461
Closed

Potential C FFI regression #19342

johnnovak opened this issue Jan 8, 2022 · 18 comments · Fixed by #19385 or #19461

Comments

@johnnovak
Copy link
Contributor

Summary

Found a potential C FFI regression with 1.6.2. A library of mine that has been in use for years has started crashing on 1.6.2. Tried turning off the GC with gc:none but the crash still happens, hence I think it's an FFI regression.

I can't really isolate it easily, but it's very easy to compile and run an example app included in the project that will result in a crash. No dependencies are required whatsoever.

Tested on

Win 10 (x64) and Mac OS X Big Sur 10.6.2 with Nim 1.6.2 and Nim 1.6.0

How to reproduce

  1. Check out this repo: https://github.com/johnnovak/nim-nanovg
  2. Compile & run the example with 1.6.2:
nimble examplesGL3Debug
cd examples
example_gl3.exe
  1. The program will immediately crash with the following error:
Traceback (most recent call last)
D:\Work\Code\nim-nanovg\examples\example_gl3.nim(161) example_gl3
D:\Work\Code\nim-nanovg\examples\example_gl3.nim(116) main
D:\Work\Code\nim-nanovg\examples\demo.nim(1017) renderDemo
D:\Work\Code\nim-nanovg\examples\demo.nim(305) drawEyes
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
  1. Repeat with 1.6.2 with gc:none or any other GC setting -- the exact same crash will happen
  2. Repeat with 1.6.0 with any GC setting -- the program will run correctly (it will open a window with some OpenGL demo graphics)
@Araq
Copy link
Member

Araq commented Jan 8, 2022

There are no FFI related changes in 1.6.2. Most likely is that it's an old bug in your wrapper that now got "activated". Please run it under Linux and valgrind and see what it reports.

@ringabout
Copy link
Member

Hello, @johnnovak

If you get some free time, could you try your example with git bisect?

https://nim-lang.org/docs/intern.html#bisecting-for-regressions

Thank you in advance.

@johnnovak
Copy link
Contributor Author

There are no FFI related changes in 1.6.2. Most likely is that it's an old bug in your wrapper that now got "activated". Please run it under Linux and valgrind and see what it reports.

That might well be it. I'll investigate it further when I'll have some time.

If you get some free time, could you try your example with git bisect?

Yeah, I'll try that too, thanks.

@oakes
Copy link

oakes commented Jan 9, 2022

I have the same problem with 1.6.2. My opengl-based text editor paravim is using tree sitter for syntax highlighting nim files, and starting with 1.6.2 it crashes when opening a nim file. Very easy to reproduce...clone vim_cubed and run...

nimble run vim3 src/core.nim

...results in:

Traceback (most recent call last)
/Users/sekao/Documents/vim_cubed/src/vim3.nim(44) vim3
/Users/sekao/.nimble/pkgs/paravim-0.18.2/paravim.nim(116) init
/Users/sekao/.nimble/pkgs/paravim-0.18.2/paravim/vim.nim(223) onAutoCommand
/Users/sekao/.nimble/pkgs/paravim-0.18.2/paravim/vim.nim(217) onBufEnter
/Users/sekao/.nimble/pkgs/paravim-0.18.2/paravim/tree_sitter.nim(88) parse
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

You can also clone pvim and run nimble run pvim src/pvim.nim to get the same result. These projects work fine in 1.6.0.

The error is happening on this line which is being nil-checked, i'm positive that i am not passing nil to that c function.

BTW john thank you for illwill, i friggin love it :D

@johnnovak
Copy link
Contributor Author

johnnovak commented Jan 11, 2022

BTW john thank you for illwill, i friggin love it :D

Cheers, glad you like it :)

I quickly had a look at paranim and you're not using glad for the OpenGL like I do in my project. So we're having the same crash problem with two different OpenGL wrapper implementations... Hmm, something fishy is definitely going on. I guess they can be wrong in a similar way, though, that's a possibility too.

Otherwise, I don't have easy access to a Linux box. I guess I could screw around with valgrind in WSL on Windows 10, but it's been ages since I used valgrind and I don't have high confidences in WSL... So if someone could take a look that would be appreciated. I could try to narrow it further down to a minimal example.

@oakes
Copy link

oakes commented Jan 11, 2022

I'm pretty sure the opengl use is just a coincidence because my crash is happening in the tree sitter code, which has nothing to do with it. In fact i can reproduce the crash in a small unit test without loading any opengl code at all.

@johnnovak
Copy link
Contributor Author

I'm pretty sure the opengl use is just a coincidence because my crash is happening in the tree sitter code, which has nothing to do with it. In fact i can reproduce the crash in a small unit test without loading any opengl code at all.

Maybe post a link here to the branch so we can all give it a go?

@oakes
Copy link

oakes commented Jan 12, 2022

Sure, here it is: https://github.com/paranim/paravim/tree/ffi_crash

Here's the unit test: https://github.com/paranim/paravim/blob/ffi_crash/tests/test1.nim

If i run nimble test, i get:

Traceback (most recent call last)
/Users/sekao/Documents/pvimpkg/tests/test1.nim(15) test1
/Users/sekao/Documents/pvimpkg/src/paravim/tree_sitter.nim(87) parse
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

But if i choosenim 1.6.0 and re-run, it runs without any error.

@ghost
Copy link

ghost commented Jan 13, 2022

This is indeed a regression and looks like a serious one. It's related to Nim optimizing object passing to pass-by-reference when the object is larger than 24 bytes, and it started doing that even for bycopy objects for some reason, so @Araq I do think it is a real issue.

Small code to reproduce:

type
  TSNode* {.bycopy.} = object
    pad*: array[25, uint8] # 25 bytes to trigger the bug

proc myproc_export(): TSNode {.cdecl, exportc.} = 
  discard

proc myproc(): TSNode {.importc: "myproc_export".}

proc parse =
  let node = myproc()
  echo node.pad[0]

parse()

Generated C code for the parse proc - you can see that myproc_export is called with an extra argument being the address of node and Nim doesn't care about the return value of this function:

N_LIB_PRIVATE N_NIMCALL(void, parse__tast_6)(void) {
	tyObject_TSNode__I9aoGqn58venVNwvMj7q6Ug node;
	tyArray__nHXaesL0DJZHyVS07ARPRA T1_;
	nimZeroMem((void*)(&node), sizeof(tyObject_TSNode__I9aoGqn58venVNwvMj7q6Ug));
	myproc_export((&node));
	nimZeroMem((void*)T1_, sizeof(tyArray__nHXaesL0DJZHyVS07ARPRA));
	T1_[0] = dollar___systemZdollars_9(((NU64) (node.pad[(((NI) 0))- 0])));
	echoBinSafe(T1_, 1);
}

Another example from @oakes's code (ts_tree_root_node is supposed to only take a single pointer argument and return a struct):

N_LIB_PRIVATE N_NIMCALL(tySequence__lqtQnDnN75JdopcKXmk8aQ**, parse__OOZsrcZparavimZtree95sitter_163)(void* tree, NI lineCount) {
	tySequence__lqtQnDnN75JdopcKXmk8aQ** result;
	result = (tySequence__lqtQnDnN75JdopcKXmk8aQ**)0;
	{
		tyObject_TSNode__3gvtIyLCgQ9cRZvYDNYxBjg node;
		if (!!((tree == NIM_NIL))) goto LA3_;
		nimZeroMem((void*)(&node), sizeof(tyObject_TSNode__3gvtIyLCgQ9cRZvYDNYxBjg));
		ts_tree_root_node(tree, (&node));
		result = (tySequence__lqtQnDnN75JdopcKXmk8aQ**) newObj((&NTInodes__H5rtJyrA54YseYSwQ9cU8WA_), sizeof(tySequence__lqtQnDnN75JdopcKXmk8aQ*));
		asgnRef((void**) (&(*result)), newSeq__OOZsrcZparavimZtree95sitter_175(((NI) (lineCount))));
		parse__OOZsrcZparavimZtree95sitter_74(node, &result);
	}
	LA3_: ;
	return result;
}

@ghost
Copy link

ghost commented Jan 13, 2022

The regression seems to be caused by #19115 as the code works fine with it reverted.

@johnnovak
Copy link
Contributor Author

Great, that proves that I'm not going mad :) For the records, I've checked my C wrappers multiple times and couldn't see anything wrong with them. I really don't have much time to mess around with this low-level jiggery-pokery, that's one of the reasons why I'm using Nim and not C/C++ in the first place, so thanks @Yardanico for putting together a test case! :salute:

@arnetheduck
Copy link
Contributor

This points to a bit more of a general problem in Nim: the options for selecting a particular ABI (for example the Nim ABI for libraries with name mangling etc, or the C ABI, or in the future, other ABI:s) for the codegen are somewhat sprawling - on the one hand, one wants the freedom to use whatever ABI fits the situation, in the compiler - on the other, for interop, it needs to be very specific - bycopy controls one aspect, but there are also others such as alignment, field ordering (with inheritance etc), argument ordering (where does the NRVO hack go, in the list? this is relevant for interop with other, RVO-implementing languages) and so on that are not ... documented / specified.

@Araq
Copy link
Member

Araq commented Jan 13, 2022

For a start, NVRO should not be done for .bycopy types. So it would be a bugfix of the bugfix. :-( Later we can design a set of pragmas that works better than what we have.

ringabout added a commit to ringabout/Nim that referenced this issue Jan 13, 2022
fix nim-lang#19342; Imo nrvo shouldn't touch importc procs
ringabout added a commit to ringabout/Nim that referenced this issue Jan 13, 2022
fix nim-lang#19342; Imo nrvo shouldn't touch importc object even with cdecl
ringabout added a commit to ringabout/Nim that referenced this issue Jan 13, 2022
ringabout added a commit to ringabout/Nim that referenced this issue Jan 13, 2022
narimiran pushed a commit that referenced this issue Jan 17, 2022
narimiran pushed a commit that referenced this issue Jan 17, 2022
narimiran pushed a commit that referenced this issue Jan 17, 2022
@johnnovak
Copy link
Contributor Author

Aren't you guys going to release a patch for this as a matter of urgency? Or at least announce it on the blog that there's a serious regression in 1.6.2? There must be tons of developers interfacing with C code encountering these issues and scratching their heads...

@Araq
Copy link
Member

Araq commented Jan 20, 2022

I'm waiting on #19363 and then 1.6.4 is ready to go. Sorry for the delay.

@johnnovak
Copy link
Contributor Author

Thanks @Araq

@narimiran
Copy link
Member

@johnnovak The 1.6.4RC is now available: https://forum.nim-lang.org/t/8839 (the official release should come later this week, if no new regressions are found)

@johnnovak
Copy link
Contributor Author

@johnnovak The 1.6.4RC is now available: https://forum.nim-lang.org/t/8839 (the official release should come later this week, if no new regressions are found)

Cheers, I'll give it a go.

Araq pushed a commit that referenced this issue Jan 28, 2022
* nvro don't touch cdecl types; fix #19342 again
narimiran pushed a commit that referenced this issue Jan 28, 2022
* nvro don't touch cdecl types; fix #19342 again

(cherry picked from commit 0c3892c)
PMunch pushed a commit to PMunch/Nim that referenced this issue Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants