-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
viv: insn: string: handle viv bug around substrings #1273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
if b"\x00" in buf: | ||
# account for bug #1271. | ||
# remove when vivisect is fixed. | ||
buf = buf.partition(b"\x00")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should really never be the case, but viv is broken
# partition to account for bug #1271. | ||
# remove when vivisect is fixed. | ||
return read_memory(vw, offset, ulen).decode("utf-16").partition("\x00")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not really able to guess at the correct string length. this has an edge case where we have UTF16 | NULL | junk
and when we try to decode as UTF16 then it fails due to the junk. we'll fail to extract some valid strings in this case, which is unfortunate.
tests/fixtures.py
Outdated
@@ -627,6 +629,8 @@ def parametrize(params, values, **kwargs): | |||
("mimikatz", "function=0x40105D", capa.features.common.String("ACR > "), True), | |||
("mimikatz", "function=0x40105D", capa.features.common.String("nope"), False), | |||
("773290...", "function=0x140001140", capa.features.common.String(r"%s:\\OfficePackagesForWDAG"), True), | |||
# overlapping string, see #1271 | |||
("294b8d...", "function=0x404970,bb=0x404970,insn=0x40499F", capa.features.common.String("\r\n"), True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we expect this string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. the sample contains the string "\r\n\r\n\x00HTTP" and this string is fetched using &string[2]
. this test demonstrates the substring handling works as expected (and no embedded NULL bytes).
capa supports whitespace in strings, including newlines. filtering that out would be another discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
- In
capa/features/extractors/strings.py
we do not extract\r\n
so this seems inconsistent between file and instruction scope? - We have a lint to warn on strings with length smaller than 4, "capa only extracts strings with length >= 4"?
Should we make this uniform across the extractors or keep as is (since we have more knowledge at the instruction scope, for example)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, great points!
lets create two new issues, one for each of those points.
this is still the only test case i know of for this issue, but maybe we can find another that more clearly shows the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test here instead that the wrong string is not extracted?
CHANGELOG updated or no update needed, thanks! 😄
@williballenthin, please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me (cant review cause im an author)
closes #1271
try to detect the case in which viv incorrectly returned a substring length and truncate the returned data appropriately. i think this will work better for ASCII than for UTF16 because we can better guess about embedded NULL bytes.
Checklist