Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: Take instruction size into account when handling poison relocation #1351

Merged
merged 2 commits into from
Feb 21, 2024

Conversation

dylandreimerink
Copy link
Member

A CO-RE relocation might be poisoned, in such cases we write a "bogus" instruction to the relocation target. This can still result in working code if that instruction turns out to be in a dead code branch due to other CO-RE logic.

Currently this bogus instructions is always a function call to a non-existing function. This is a problem when the original instruction is a dword load immediate, which takes 2 instructions worth of space. When we replace this with a function call, we shrink the instruction stream which throws off offsets and instruction metadata.

So this commit makes makes the CO-RE logic check if we are dealing with a dword load immediate, and if so, we replace it with a dword imm load of 0xbad2310 to R10, which is illigal and will trip the verifier if evaluated.

Fixes: #1348

@dylandreimerink dylandreimerink requested a review from a team as a code owner February 19, 2024 13:38
@dylandreimerink dylandreimerink added the bug Something isn't working label Feb 19, 2024
@lmb
Copy link
Collaborator

lmb commented Feb 20, 2024

Thanks for diagnosing this!

  • Does libbpf do the same write into R10 trick, or something different?
  • Can you add a test?

P.S. The reason we don't adjust automatically for this is because the offset is encoded in the instruction stream. Does the jump instruction in the ELF carry a relocation to an LBB_ block? We strip those here:

ebpf/elf_reader.go

Lines 258 to 260 in e4ec617

// LLVM emits LBB_ (Local Basic Block) symbols that seem to be jump
// targets within sections, but BPF has no use for them.
if symType == elf.STT_NOTYPE && elf.ST_BIND(symbol.Info) == elf.STB_LOCAL &&

@dylandreimerink
Copy link
Member Author

Does libbpf do the same write into R10 trick, or something different?

I came up with this myself, I didn't think to look at libbpf's approach. They seem to emit 2 call instructions in this case:

if (res->poison) {
poison:
	/* poison second part of ldimm64 to avoid confusing error from
	 * verifier about "unknown opcode 00"
	 */
	if (is_ldimm64_insn(insn))
		bpf_core_poison_insn(prog_name, relo_idx, insn_idx + 1, insn + 1);
	bpf_core_poison_insn(prog_name, relo_idx, insn_idx, insn);
	return 0;
}

We could just copy that.

Can you add a test?

Yea, ofcourse.

Does the jump instruction in the ELF carry a relocation to an LBB_ block?

It does, but I think the current fix is good enough?

bpf_bpf.o:      file format elf64-bpf

Disassembly of section kprobe/tcp_close:

0000000000000000 <kprobe_tcp_close>:
       0:       18 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x1 ll
       2:       15 01 03 00 00 00 00 00 if r1 == 0x0 goto +0x3 <LBB0_2>
       3:       18 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x1 ll
       5:       63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 0x4) = r1

0000000000000030 <LBB0_2>:
       6:       b7 00 00 00 00 00 00 00 r0 = 0x0
       7:       95 00 00 00 00 00 00 00 exit

@lmb
Copy link
Collaborator

lmb commented Feb 20, 2024

We could just copy that.

Yeah let's do that. The call instruction is already confusing enough.

@dylandreimerink dylandreimerink force-pushed the feature/fix-1348 branch 2 times, most recently from e43a484 to 36e4a6b Compare February 21, 2024 11:18
A CO-RE relocation might be poisoned, in such cases we write a "bogus"
instruction to the relocation target. This can still result in working
code if that instruction turns out to be in a dead code branch due to
other CO-RE logic.

Currently this bogus instructions is always a function call to a
non-existing function. This is a problem when the original instruction
is a dword load immediate, which takes 2 instructions worth of space.
When we replace this with a function call, we shrink the instruction
stream which throws off offsets and instruction metadata.

So this commit makes makes the CO-RE logic check if we are dealing with
a dword load immediate, and if so, we replace it with a dword imm load
of 0xbad2310 to R10, which is illigal and will trip the verifier if
evaluated.

Signed-off-by: Dylan Reimerink <[email protected]>
Added a test which makes sure we do not mess up offsets when replacing
a ld64imm with poisoned instructions.

Signed-off-by: Dylan Reimerink <[email protected]>
Copy link
Collaborator

@lmb lmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lmb lmb merged commit b24722c into cilium:main Feb 21, 2024
15 checks passed
@lmb
Copy link
Collaborator

lmb commented Feb 21, 2024

From discussion on Slack: going with the write to RFP trick, since that means we can avoid breaking API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't load CO-RE eBPF code that accesses enum values
2 participants