-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot see native call frames in GDB #2587
Comments
@YaSuenag it would be great if you could come up with a reproducer that does not involve changing how libpolyglot gets built.
|
AFAICS libpolyglot.so in GraalVM 20.1.0 and upstream does not have DWARF entry for Java layer, so we need to recompile it with I tried to build GraalVM with following patch, but still does not work well on GDB.
|
Of course we can reproduce same problem on GraalVM 20.1.0 with my reproducer (libpolyglot-crash.c) |
Hi @YaSuenag @olpaw If so can you provide precise instructions as to how to build the library (it is does not appear to be built when I run mx build in the substratevm directory). @YaSuenag Are you sure your build tree includes this PR that is needed allow gdb 9.x to work (that is the version of gdb that is used on fedora 32). |
Yes, libpolyglot.so is built from PolyglotNativeAPI.java by native image generator. So I expect that I can see Java frame in native backtrace.
I run
Yes, head of my source tree is 64d60d2. |
Hi @YaSuenag Thanks for the info. I'll try to follow your instructions and see if I can find an error in the DWARF debug info or in gdb that explains the crash. |
HI @YaSuenag The first problem is that gdb needs to know where to find libpolyglot.so. I ran the program from inside gdb with LD_LIBRARY_PATH set correctly and saw this:
So, if the LD_LIBRARY_PATH Is set when gdb is started then the debugger appears to find symbols and refers code addresses back to the original source file. The only oddity I see above is that method PolyglotNativeAPI_poly_create_context_builder_232c64e.... is located at line 1 of file IsolateEnterStub.java. That's not too surprising as it appears to be a compiler generated method. The second problem is that the debugger does not find a thread stack in the core file.
I think this is because the crash is not dumping details of the stack at the SEGV into the core file. I don't think it is a problem with the debuginfo because the debug session shows gdb working right up to the crash point. |
Thanks @adinn ! I could saw call stacks when I set breakpoint in GDB. I checked details in below, I guess the information is available what we need at least, however they seem not to work well on GDB. Do you have any idea to resolve it? DetailsBacktrace on GDB
Load address from
|
I'm not sure why gdb is failing to resolve the crash pc in the core to an address in libpolyglot.so. It does look like all the information needed is available to gdb. I will see if I can identify the problem by debugging gdb itself. Also, I will ask the Red Hat gdb team if they can explain what is going wrong. |
Thank you so much @adinn ! BTW I tried to use LLDB (lldb-10.0.0-1.fc32.x86_64) to analyze core, it shows interesting backtrace as below:
I wonder why I checked symbols in libpolyglot.so with |
Hi @YaSuenag
and after that address they get listed as as
i.e. this looks like a problem with the .debug_aranges section. The last range for which a method name was printed was StrictMathInvoker::tanh. and its range section includes this
i.e. the address ranges in this compilation unit include a massive jump forward out of the normal generated method code range. Later range entries are ignored because they switch back to addresses in the range starting at 0x804860. Looking at the info section around that sudden offset jump I see this
The methods with name StrictMathInvoker::acos** and StrictMathInvoker::asin** appear to be stubs that invoke methods of libm. An assumption of the current debug info generator is that methods associated with a given class are all compiled at once and occupy a continuous address range in the generated code section. The range generator simply walks through the methods in compile order and outputs ranges based on the method offset in the code segment. So, it looks like I will either have to filter out the range entries for these stubs or else do some sorting before generating the address ranges. I am surprised the asserts in the debug info generator code did not object to successive ranges for classes crossing over each other. I will investigate further and let you know when I work out what is needed to fix this. |
It seems these extra methods whose names end with a '**' suffix are used as deoptimization targets. They are generated after all the normal methods associated with the class. That means that each class is associated with two distinct ranges of code addresses and that ranges from one class can be interleaved with ranges from a later class. Unfortunately, the compilation units in the info section need to be sorted in non-overlapping, lo to hi address ranges. Similarly, the frame and range sections also need to list method frame info and range info in non-overlapping, lo to hi address ranges. So, in order to keep gdb happy I have to treat classes which have deopt targets as two compilation units,splitting the methods between them and generate the relevant content in two stages. I have a patch ready and will raise a PR for it. |
I tried to see native call frames of customized GraalVM which was built with
-H:GenerateDebugInfo=1
in native debugger (GDB) on Linux, but I could not see them in GDB.I expect I can see call frames which are implemented by Java in libpolyglot.
objdump --dwarf=info libpolyglot.so
shows DWARF information for Java layer, soGenerateDebugInfo
seems to be applied partially.Steps to reproduce the issue
-H:GenerateDebugInfo=1
with following patchlibpolyglot-crash.c
Describe GraalVM and your environment:
The text was updated successfully, but these errors were encountered: