Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPF JIT segfault due to mprotect failure #18177

Closed
leoluk opened this issue Jun 23, 2021 · 10 comments
Closed

BPF JIT segfault due to mprotect failure #18177

leoluk opened this issue Jun 23, 2021 · 10 comments
Assignees
Milestone

Comments

@leoluk
Copy link
Contributor

leoluk commented Jun 23, 2021

Version:

commit cd6e1d921c5edfe6f73a79eda2af699f62660aa0 (HEAD, tag: v1.7.1)
Author: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Date:   Tue Jun 8 07:48:39 2021 +0000

Environment:

$ cat /etc/redhat-release 
CentOS Linux release 8.1.1911 (Core) 
$ uname -a 
Linux 4.18.0-147.5.1.el8_1.x86_64 #1 SMP Wed Feb 5 02:00:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ rustup show 
Default host: x86_64-unknown-linux-gnu
rustup home:  /opt/solana/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.51.0 (2fd73fabe 2021-03-23)

$ gcc --version 
gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Reproducible by deploying a new program:

solana deploy -u l wormhole.so

mprotect failure:

blockstore_proc(25914) mprotect (0x5619c9022000, 10293248, PROT_READ|PROT_EXEC)
blockstore_proc(25914)         mprotect -> -13 (EACCESS)

/proc/self/maps:

561986fee000-561988e5d000 r-xp 00000000 fd:02 589645006                  /data/solana/solana/target/release/solana-validator
56198905d000-561989167000 r--p 01e6f000 fd:02 589645006                  /data/solana/solana/target/release/solana-validator
561989167000-56198916e000 rw-p 01f79000 fd:02 589645006                  /data/solana/solana/target/release/solana-validator
56198916e000-561989175000 rw-p 00000000 00:00 0 
56198b163000-5619c8fd3000 rw-p 00000000 00:00 0                          [heap]
5619c8fd3000-5619c8fdd000 r--p 00000000 00:00 0                          [heap]
5619c8fdd000-5619cabf3000 rw-p 00000000 00:00 0                          [heap]
[... about 400k account mmaps ...]

Backtrace:

Thread 457 (Thread 0x7faf88e56700 (LWP 31801)):
#0  0x00007fb485e4e7db in mprotect () from /lib64/libc.so.6
#1  0x0000561987e26a17 in solana_rbpf::jit::JitCompiler::compile ()
#2  0x0000561987e2ac57 in solana_rbpf::jit::JitProgram<E,I>::new ()
#3  0x0000561987e5fb73 in <solana_rbpf::elf::EBpfElf<E,I> as solana_rbpf::vm::Executable<E,I>>::jit_compile ()
#4  0x0000561987e15b10 in solana_bpf_loader_program::create_executor ()
#5  0x0000561987e172b7 in solana_bpf_loader_program::process_instruction_common ()
#6  0x0000561987e16544 in solana_bpf_loader_program::process_instruction_jit ()
(gdb) info registers 
rax            0xfffffffffffffff3  -13
rbx            0x7faf88e51f00      140391892721408
rcx            0x7fb485e4e7db      140413317212123
rdx            0x5                 5
rsi            0x9d1000            10293248
rdi            0x5619c9022000      94668746530816
rbp            0x7faf88e51d70      0x7faf88e51d70
rsp            0x7faf88e51c08      0x7faf88e51c08
r8             0x344085981d613ce   235322261053510606
r9             0xf                 15
r10            0x7fae8d26e350      140387669173072
r11            0x202               514
r12            0x7fb3ab5ac0b0      140409650725040
r13            0x7faf88e51d71      140391892721009
r14            0x7faf88e51f88      140391892721544
r15            0x5619890c5b58      94667673459544
rip            0x7fb485e4e7db      0x7fb485e4e7db <mprotect+11>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

Example crash that would follow the mprotect failure above (different crash!):

[Wed Jun 23 08:34:22 2021] blockstore_proc[3073745]: segfault at 5612ec601000 ip 00005612ec601000 sp 00007fb183276dc8 error 15
[Wed Jun 23 08:34:22 2021] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00

Disassembly:

0000000000000000 55                              PUSH RBP
0000000000000001 4889E5                          MOV RBP,RSP
0000000000000004 53                              PUSH RBX
0000000000000005 4154                            PUSH RSP
0000000000000007 4155                            PUSH RBP
0000000000000009 4156                            PUSH RSI
000000000000000B 4157                            PUSH RDI
000000000000000D 4989D2                          MOV R10,RDX
0000000000000010 48BB00100000                    MOV RBX,00000NAN00001000

rbpf crate fix that catches the mprotect error:

@mvines mvines modified the milestones: v1.7.2, v1.7.4 Jun 23, 2021
@im-0
Copy link
Contributor

im-0 commented Jul 2, 2021

I have the same problem on my TdS validator:

...
[91530.108503] blockstore_proc[2076]: segfault at 56069edb2000 ip 000056069edb2000 sp 00007fa2f7beb958 error 15
[91530.108508] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00
[93636.186928] blockstore_proc[27291]: segfault at 55beaefc4000 ip 000055beaefc4000 sp 00007f58ab480938 error 15
[93636.186934] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00
[154662.796795] blockstore_proc[28516]: segfault at 55a9dc2bc000 ip 000055a9dc2bc000 sp 00007efd3c95a0f8 error 15
[154662.796803] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00
...
[170790.090160] blockstore_proc[45269]: segfault at 55ecf6f08000 ip 000055ecf6f08000 sp 00007efe537e6958 error 15
[170790.090167] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00
...
  • Solana version: 1.7.3
  • Linux distribution: Fedora release 34 (Thirty Four)
  • Kernel version: 5.12.12-300.fc34.x86_64
  • Rust version: 1.53.0 (from Fedora repositories)

@im-0
Copy link
Contributor

im-0 commented Jul 7, 2021

Happened again with v1.7.4.

@Lichtso
Copy link
Contributor

Lichtso commented Jul 7, 2021

I just merged the potential mitigation today: #18068
So will still take some time to be picked up by the next release unless you want to try it out cherry-picked.

None the less, the conditions to reproduce this still remain a mystery ...

@im-0
Copy link
Contributor

im-0 commented Jul 10, 2021

Thanks! This happens relatively rarely, and systemd just restarts validator after the crash. I'll just wait until the next release.

@leoluk
Copy link
Contributor Author

leoluk commented Jul 10, 2021

None the less, the conditions to reproduce this still remain a mystery ...

I managed to reproduce it by deploying a contract 🎉

The bad news is that it happens every time a new contract is deployed, so we definitely need to figure out the root cause. What happens with the mitigation? Will the validator mark the slot dead and fork every time a failure occurs?

@ryoqun
Copy link
Member

ryoqun commented Jul 15, 2021

(just newbie's comment, deploy triggered oddity reminds me of #17350, which was hard to pin-down (ref: #17083 (comment))... I know @Lichtso knows way better than me around these relevant code. so, just my 2 cents)

@leoluk
Copy link
Contributor Author

leoluk commented Jul 15, 2021

It's deploy-triggered, I think, because this causes a new JIT segment to be mapped.

@im-0
Copy link
Contributor

im-0 commented Jul 30, 2021

I found it much easier to cherry-pick relevant patches on solana_rbpf instead of cherry-picking rbpf updates on solana itself. Squashing merges are not helpful here =/.

If anyone interested, here are the patches for solana_rbpf v0.2.11 (solana 1.7.8). Patched testnet validator works fine for 12 hours since update.
0001-Fix-libc-error-detection-182.patch.txt
0002-Use-mmap-instead-of-memalign-184.patch.txt

@CriesofCarrots CriesofCarrots modified the milestones: v1.7.4, v1.7.11 Aug 27, 2021
@leoluk
Copy link
Contributor Author

leoluk commented Sep 1, 2021

This still occurs with 1.6.22:

[Wed Sep  1 02:14:32 2021] blockstore_proc[2749251]: segfault at 56220f8a1000 ip 000056220f8a1000 sp 00007fa1689d00b8 error 15
[Wed Sep  1 02:14:32 2021] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <55> 48 89 e5 53 41 54 41 55 41 56 41 57 49 89 d2 48 bb 00 10 00 00

Any chance the fix could be backported to 1.6.x?

@im-0
Copy link
Contributor

im-0 commented Sep 1, 2021

Interestingly, I have never seen this on mainnet with v1.6.*, but this happened like weekly on testnet with v1.7.*.

I am currently running 1.7.11 with patched rbpf on testnet (no crashes since update to patched) and 1.6.22 on mainnet. Both on Fedora 34, kernel 5.13.12.

@CriesofCarrots CriesofCarrots modified the milestones: v1.7.11, v1.7.15 Sep 30, 2021
@jstarry jstarry modified the milestones: v1.7.15, v1.7.17 Oct 22, 2021
@CriesofCarrots CriesofCarrots modified the milestones: v1.7.17, v1.7.18 Oct 28, 2021
@mvines mvines modified the milestones: v1.7.18, The Future! Dec 3, 2021
@Lichtso Lichtso closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants