Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section::entropy returns -0.0 for a one-byte section #976

Closed
PaulDance opened this issue Oct 5, 2023 · 2 comments
Closed

Section::entropy returns -0.0 for a one-byte section #976

PaulDance opened this issue Oct 5, 2023 · 2 comments

Comments

@PaulDance
Copy link

Describe the bug
Whenever one parses a Mach-O file containing a section which contents only have one byte, the computed entropy for it is -0.0 instead of exactly 0.0, which is a bit surprising at first. It still has the property that >= 0.0 because -0.0 == 0.0, but not that str(-0.0) == str(0.0) because the - is kept in the string representation of the float. Having it consistent with the intuition of >= 0.0 would be nice, especially if one uses the string representation in CI for example.

To Reproduce

>>> import lief
>>> lief.MachO.Section("weird_section", [1, 2, 3]).entropy
1.584962500721156
>>> lief.MachO.Section("weird_section", [1, 2]).entropy
1.0
>>> lief.MachO.Section("weird_section", [1]).entropy
-0.0
>>> lief.MachO.Section("weird_section", []).entropy
0.0

Expected behavior

>>> lief.MachO.Section("weird_section", [1]).entropy
0.0

Environment:

  • System and Version: Debian testing (Linux debian 6.5.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1 (2023-09-13) x86_64 GNU/Linux).
  • Target format: found for Mach-O, but the same could be applicable to other formats (see below).
  • Python version: 3.10.13.
  • LIEF commit version: 0.12.3-39115d10.

Additional context
I think the problem comes from the following subtlety: in src/Abstract/Section.cpp's Section::entropy implementation, one can see that entropy += freq * std::log2l(freq); in the loop is used to progressively sum the final entropy up and at the end that return (-entropy); is used to return the sum value, but turned positive first.

Listing cases:

  • In the case of an empty section (last one in the example):
    • the loop is not ran because of the if (content.empty()) check above it ,
    • so the default 0. is returned.
  • When the section has more than one byte (first two cases in the example):
    • probabilities will be strictly less than 1.0,
    • so their std::log2l will be strictly negative
    • and the finally-returned value will therefore be strictly positive.
  • However, when content.size() == 1 (second to last case in the example):
    • the byte count for it will be exactly 1,
    • and the associated frequency will also be exactly 1.0 (because 1.0 / 1.0),
    • and so its std::log2l will be exactly 0.0,
    • therefore, at the end of the loop that ran only once, entropy is exactly 0.0,
    • hence the function returning exactly -0.0 because it negates it.

A potential fix should therefore be:

  • Use entropy -= freq * std::log2l(freq); instead.
  • Use return entropy; instead.
@romainthomas
Copy link
Member

romainthomas commented Oct 7, 2023

Hi @PaulDance ,

Thank you very much for the well detailed issue (highly appreciated!). I agree with your remark and
I would go for:

[...]
if (content.empty() || content.size() == 1) {
  return 0.;
}
[...]

@PaulDance
Copy link
Author

Thank you very much @romainthomas for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants