Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachO segments can have zero filesize, matching offset to other segments #1130

Closed
peledins-zimperium opened this issue Dec 5, 2024 · 10 comments
Assignees

Comments

@peledins-zimperium
Copy link
Contributor

Describe the bug
Offset of segment is used as unique identifier, for example in: offset_seg_ map.
This is incorrect as they can match in cases when offset filesize is 0.

To Reproduce
Steps to reproduce the behavior:
a.cpp:

int z[100];
int main()
{
}
clang++ ./a.cpp
objdump --headers ./a.out

./a.out:	file format mach-o arm64

Sections:
Idx Name          Size     VMA              Type
  0 __text        00000008 0000000100003fa0 TEXT
  1 __unwind_info 00000058 0000000100003fa8 DATA
  2 __common      00000190 0000000100004000 BSS

__common is in __DATA, __unwind_info is in __TEXT. __DATA has nothing inside, as it is all zeroes/uninitialized.

Issue: if we try to shift the binary, TEXT segment is not found.

Expected behavior

should not use offset as identifier; should be able to shift __text section of such a binary.

Environment (please complete the following information):

  • System and Version : MacOS
  • Target format Mach-O
  • LIEF commit version: 15.1.- f4d6835

Additional context
Add any other context about the problem here.

NOTICE

If the issue does not contain enough information to be reproduced,
it will be flagged as incomplete
and closed.

/NOTICE

@romainthomas
Copy link
Member

I think it's a duplicate of #1087. Could you test with a nightly build?

@peledins-zimperium
Copy link
Contributor Author

I suspect that the fix is not enough, because there are other locations where offset_seg_ is updated, e.g.:

diff --git a/include/LIEF/MachO/Binary.hpp b/include/LIEF/MachO/Binary.hpp
index fa31d406..0268d1ef 100644
--- a/include/LIEF/MachO/Binary.hpp
+++ b/include/LIEF/MachO/Binary.hpp
@@ -827,6 +827,11 @@ class LIEF_API Binary : public LIEF::Binary  {
     return this->is64_ ? sizeof(uint64_t) : sizeof(uint32_t);
   }
 
+  //! Add a segment command in \p offset_seg_ if its file_size > 0.
+  //! 0-file-sized segments are not interesting as no offset can be found
+  //! inside them.
+  void add_offset_seg(SegmentCommand* segment);
+
   bool        is64_ = true;
   Header      header_;
   commands_t  commands_;
diff --git a/src/MachO/Binary.cpp b/src/MachO/Binary.cpp
index 64fbf255..b41f63a4 100644
--- a/src/MachO/Binary.cpp
+++ b/src/MachO/Binary.cpp
@@ -1557,7 +1557,7 @@ size_t Binary::add_cached_segment(SegmentCommand& segment) {
     segments_.insert(it_linkedit, &segment);
   }
 
-  offset_seg_[segment.file_offset()] = &segment;
+  add_offset_seg(&segment);
   if (LinkEdit::segmentof(segment)) {
     auto& linkedit = static_cast<LinkEdit&>(segment);
     linkedit.dyld_           = dyld_info();
@@ -2426,11 +2426,16 @@ ExportInfo* Binary::add_exported_function(uint64_t address, const std::string& n
   return nullptr;
 }
 
+void Binary::add_offset_seg(SegmentCommand* segment) {
+  if (segment->file_size() > 0) {
+    offset_seg_[segment->file_offset()] = segment;
+  }
+}
 
 void Binary::refresh_seg_offset() {
   offset_seg_.clear();
   for (SegmentCommand* segment : segments_) {
-    offset_seg_[segment->file_offset()] = segment;
+    add_offset_seg(segment);
   }
 }
 
diff --git a/src/MachO/BinaryParser.tcc b/src/MachO/BinaryParser.tcc
index 5fb34f37..d5b33115 100644
--- a/src/MachO/BinaryParser.tcc
+++ b/src/MachO/BinaryParser.tcc
@@ -258,7 +258,8 @@ ok_error_t BinaryParser::parse_load_commands() {
 
           auto* segment = load_command->as<SegmentCommand>();
           segment->index_ = binary_->segments_.size();
-          binary_->offset_seg_[segment->file_offset()] = segment;
+          binary_->add_offset_seg(segment);
+
           binary_->segments_.push_back(segment);
 
           if (segment->name() == "__TEXT" && imagebase < 0) {

@peledins-zimperium
Copy link
Contributor Author

But I hit some issues further even with this and suspect that there is some other location where problems arise.

@peledins-zimperium
Copy link
Contributor Author

peledins-zimperium commented Dec 9, 2024

% pip install --index-url https://lief.s3-website.fr-par.scw.cloud/latest lief==0.16.0.dev0
Looking in indexes: https://lief.s3-website.fr-par.scw.cloud/latest
Requirement already satisfied: lief==0.16.0.dev0 in /Users/peteris.ledins/exp/lief_venv/lib/python3.13/site-packages (0.16.0.dev0)
% cat a.cpp
int z[100];
int main()
{
}
% clang a.cpp
% python3    
Python 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lief
>>> a=lief.MachO.parse("./a.out")
>>> bin=a.take(lief.MachO.Header.CPU_TYPE.ARM64)
>>> bin.shift(10000)
<lief._lief.ok_t object at 0x105c88af0>
>>> bin.write("b.out")

Results:
in a.out __DATA segment has vm address of 0000000100004000, which is after __TEXT and 0000000100000000.
After shifting __text is at 00000001000066B0, but __DATA is still at 0000000100004000... which seems incorrect.
llvm fails to open this binary.
I did not investigate further as I can work around it for my purposes.

@romainthomas
Copy link
Member

I think you already had a nightly version and your pip install did not take the latest one.
Could you try with the --ignore-installed flag:

$ pip install --ignore-installed --index-url https://lief.s3-website.fr-par.scw.cloud/latest lief==0.16.0.dev0

On my end, the binary seems correctly shifted:

2c2
< a.out:        file format mach-o arm64
---
> new.out:      file format mach-o arm64
6,8c6,8
<   0 __text        00000008 0000000100003fa0 TEXT
<   1 __unwind_info 00000058 0000000100003fa8 DATA
<   2 __common      00000190 0000000100004000 BSS
---
>   0 __text        00000008 0000000100004388 TEXT
>   1 __unwind_info 00000058 0000000100004390 DATA
>   2 __common      00000190 00000001000043e8 BSS

@peledins-zimperium
Copy link
Contributor Author

I think the objdump printout is not sufficient for seeing the problem; it becomes sufficient if delta is larger though:

% pip install --ignore-installed --index-url https://lief.s3-website.fr-par.scw.cloud/latest lief==0.16.0.dev0 
Looking in indexes: https://lief.s3-website.fr-par.scw.cloud/latest
Collecting lief==0.16.0.dev0
  Using cached https://lief.s3-website.fr-par.scw.cloud/latest/lief/lief-0.16.0.dev0-cp313-cp313-macosx_11_0_arm64.whl (2.7 MB)
Installing collected packages: lief
Successfully installed lief-0.16.0.dev0
% python3
Python 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lief
>>> a=lief.MachO.parse("./a.out")
>>> bin=a.take(lief.MachO.Header.CPU_TYPE.ARM64)
>>> bin.shift(100000)
<lief._lief.ok_t object at 0x10341c850>
>>> bin.write("c.out")
>>> 
% objdump --headers ./c.out
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/objdump: error: './c.out': truncated or malformed object (addr field plus size of section 0 in LC_SEGMENT_64 command 2 greater than than the segment's vmaddr plus vmsize)

@romainthomas
Copy link
Member

You are right. Checking for a fix!

@peledins-zimperium
Copy link
Contributor Author

No need to update add_cached_segment and this?:

void Binary::refresh_seg_offset() {
  offset_seg_.clear();
  for (SegmentCommand* segment : segments_) {
    offset_seg_[segment->file_offset()] = segment;
  }
}

Both would cause trouble when file_offset=0

@romainthomas
Copy link
Member

You are right! Thank you!

@romainthomas
Copy link
Member

Actually my fix is not correct.

romainthomas added a commit that referenced this issue Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants