You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Whenever one parses a Mach-O file containing a section which contents only have one byte, the computed entropy for it is -0.0 instead of exactly 0.0, which is a bit surprising at first. It still has the property that >= 0.0 because -0.0 == 0.0, but not that str(-0.0) == str(0.0) because the - is kept in the string representation of the float. Having it consistent with the intuition of >= 0.0 would be nice, especially if one uses the string representation in CI for example.
System and Version: Debian testing (Linux debian 6.5.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1 (2023-09-13) x86_64 GNU/Linux).
Target format: found for Mach-O, but the same could be applicable to other formats (see below).
Python version: 3.10.13.
LIEF commit version: 0.12.3-39115d10.
Additional context
I think the problem comes from the following subtlety: in src/Abstract/Section.cpp's Section::entropy implementation, one can see that entropy += freq * std::log2l(freq); in the loop is used to progressively sum the final entropy up and at the end that return (-entropy); is used to return the sum value, but turned positive first.
Listing cases:
In the case of an empty section (last one in the example):
the loop is not ran because of the if (content.empty()) check above it ,
so the default 0. is returned.
When the section has more than one byte (first two cases in the example):
probabilities will be strictly less than 1.0,
so their std::log2l will be strictly negative
and the finally-returned value will therefore be strictly positive.
However, when content.size() == 1 (second to last case in the example):
the byte count for it will be exactly 1,
and the associated frequency will also be exactly 1.0 (because 1.0 / 1.0),
and so its std::log2l will be exactly 0.0,
therefore, at the end of the loop that ran only once, entropy is exactly 0.0,
hence the function returning exactly -0.0 because it negates it.
A potential fix should therefore be:
Use entropy -= freq * std::log2l(freq); instead.
Use return entropy; instead.
The text was updated successfully, but these errors were encountered:
Describe the bug
Whenever one parses a Mach-O file containing a section which contents only have one byte, the computed entropy for it is
-0.0
instead of exactly0.0
, which is a bit surprising at first. It still has the property that>= 0.0
because-0.0 == 0.0
, but not thatstr(-0.0) == str(0.0)
because the-
is kept in the string representation of the float. Having it consistent with the intuition of>= 0.0
would be nice, especially if one uses the string representation in CI for example.To Reproduce
Expected behavior
Environment:
Linux debian 6.5.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.3-1 (2023-09-13) x86_64 GNU/Linux
).3.10.13
.0.12.3-39115d10
.Additional context
I think the problem comes from the following subtlety: in
src/Abstract/Section.cpp
'sSection::entropy
implementation, one can see thatentropy += freq * std::log2l(freq);
in the loop is used to progressively sum the final entropy up and at the end thatreturn (-entropy);
is used to return the sum value, but turned positive first.Listing cases:
if (content.empty())
check above it ,0.
is returned.1.0
,std::log2l
will be strictly negativecontent.size() == 1
(second to last case in the example):1
,1.0
(because1.0 / 1.0
),std::log2l
will be exactly0.0
,entropy
is exactly0.0
,-0.0
because it negates it.A potential fix should therefore be:
entropy -= freq * std::log2l(freq);
instead.return entropy;
instead.The text was updated successfully, but these errors were encountered: