-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illegal instruction occurs in simple scenario #1495
Comments
From [email protected] on August 04, 2014 17:02:07 I found the program works today. It deterministically failed yesterday. |
From [email protected] on August 05, 2014 00:56:37 the call stack trace from core dump is |
From [email protected] on August 05, 2014 01:22:02 I manage to reproduce the issue. I provide a link for the test case to reproduce. https://hkustconnect-my.sharepoint.com/personal/yueqili_connect_ust_hk/_layouts/15/guestaccess.aspx?guestaccesstoken=E6lGK8o%2bbL87X4HZ4isgwwfUn7VxdRyN33Id7A7ggfc%3d&docid=02538b063d67e4024911fa7134b6bd7bd In dr_replay, execute run1.sh to start instrumented memcached and run2.sh to send a command to memcached, then the illegal instruction will occurs. |
From [email protected] on August 05, 2014 01:32:04 The problem can be reproduce using trunk development version and stable version. |
From [email protected] on August 05, 2014 09:02:11 The key question is, what is the instruction inside __lll_lock_elision? Is it xbegin? Xref issue #1314 . Status: Started |
From [email protected] on August 05, 2014 16:33:53 I can't reproduce: I rebuilt memcached and libsyminput.so, launched memcached under DR with libsyminput.so, and then sent it data: but no repro, whether using release or debug DR (debug DR does complain about a memory leak on exit when I then ^C memcached -- separate issue). Does your processor have Intel's TSX? If you can repro again and get gdb on live or the core dump, can you disassemble the pc so we can have the actual instruction? Status: NeedInfo |
From [email protected] on August 05, 2014 16:54:17 Does your processor have Intel's TSX? If you can repro again and get gdb on live or the core dump, can you disassemble the pc so we can have the actual instruction? |
From [email protected] on August 05, 2014 16:57:01
x/5i $pc |
From [email protected] on August 05, 2014 17:17:27 Thank you. |
From [email protected] on August 05, 2014 18:52:05 I reproduce it now. |
From [email protected] on August 05, 2014 19:18:47 I found the instruction in libpthread. |
From [email protected] on August 05, 2014 21:01:52 libpthread is supposed to only use xbegin if it determines at init time that the hardware supports lock elision. The question is, did DR somehow mess up its determination on what the hardware supports? That seems unlikely but it's possible. Ideally, if you could reproduce it natively (i.e., not under DR) we would have a definitive answer. If you file the libpthread bug now they will probably blame DR. Have you ever seen it without a client library? Qin, any thoughts on whether the private loader might somehow mess up the app libpthread's global vars (I'm looking at __pthread_force_elision)? lionghostshop, if you still have a core dump or an attached gdb, can you run "p __pthread_force_elision"? Cc: [email protected] |
From [email protected] on August 05, 2014 21:14:14 Have you ever seen it without a client library? |
From [email protected] on August 05, 2014 21:36:26 One thing I found is that the problem seems related to certain boots. |
From [email protected] on August 06, 2014 07:44:56 I found that plugging cable will affect the reproduction of bugs. I use my laptop in two places. Sometimes, I use wireless and sometimes I use wired. Note that without DR, memcached always works fine. |
From [email protected] on August 06, 2014 07:49:46 I wonder if you could get us access to a core dump? I'd like to see the code cache code for pthread_mutex_lock to investigate a theory that the rip-rel mangling that accesses __pthread_force_elision is failing depending on where libpthread is loaded. |
From [email protected] on August 06, 2014 07:52:49 I include two core dumps and the binary file of memcache. Attachment: dr_replay.tar.bz2 |
From [email protected] on August 06, 2014 07:57:33 Could you also include your libdynamorio.so binary + debuginfo file, as it's a custom build, right? |
From [email protected] on August 06, 2014 07:59:28 The package can be downloaded from comment |
From [email protected] on August 06, 2014 17:17:38 Is there a way to set __pthread_force_elision to mask the problem temporarily? |
From [email protected] on August 06, 2014 17:29:12
I spent a little time on this today and it looks like it's not even checking __pthread_force_elision (or at least not locally). I will have to study the pthreads code again to understand what it's doing in order to posit theories on how things went wrong. My notes: % gdb ./memcached core.2328 (gdb) add-symbol-file '/work/dr/bugs/i1495/DR/DynamoRIO_replay_build/lib64/release/libdynamorio.so' 0x71000000 (gdb) disas pthread_mutex_lock (gdb) p *shared_bb->table[(((0x00007ffec526bffd * 11400714819323198485) >> (64 - shared_bb->hash_bits)) & shared_bb->hash_mask)+0] So it did not go to 0x00007ffec526c007 or 0x00007ffec526c00f. (gdb) disas __lll_lock_elision So it came through that later entry. Hmm, so it's not checking |
From [email protected] on August 06, 2014 17:34:10 Will it help if you also check the source code of pthread? |
corresponding to
where
So there are two possible way to get to the
So it means somehow, the type is set as 0x100
|
From [email protected] on August 04, 2014 01:54:31
For the Summary, please follow the guidelines at https://code.google.com/p/dynamorio/wiki/BugReporting and use one of the CRASH, APP CRASH, HANG, or ASSERT keywords What version of DynamoRIO are you using? In trunk r2734 | derek.bruening | 2014-08-04 00:01:48 +0800 (Mon, 04 Aug 2014) What operating system version are you running on? Fedora 20 64 bit What application are you running? memcache v1.4.20
built from official source http://memcached.org/ Is your application 32-bit or 64-bit? 64bit How are you running the application under DynamoRIO? DynamoRIO_build/bin64/drrun -steal_fds 0 -v -c build/libsyminput.so -- memcached_org/install/bin/memcached What happens when you run without any client? It can run successful.
What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)? Illegal instruction What steps will reproduce the problem? 1. Download http://memcached.org/ and build it using default configuration
Downlod dynamorio using svn and drmemory stable from http://www.drmemory.org/ 2. Build my simple client in the attachment.
3. DynamoRIO_build/bin64/drrun -steal_fds 0 -v -c build/libsyminput.so -- memcached_org/install/bin/memcached
echo stats |ncat localhost 11211 What is the expected output? What do you see instead? Is this an application crash, a DynamoRIO crash, a DynamoRIO assert, or a hang (see https://code.google.com/p/dynamorio/wiki/BugReporting and set the title appropriately)? It crash with Illegal instruction. Please provide any additional information below.
Attachment: sym.tar
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1495
The text was updated successfully, but these errors were encountered: