Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

viv: insn: string: handle viv bug around substrings #1273

Merged
merged 6 commits into from
Jan 19, 2023
Merged

Conversation

williballenthin
Copy link
Collaborator

@williballenthin williballenthin commented Jan 9, 2023

closes #1271

try to detect the case in which viv incorrectly returned a substring length and truncate the returned data appropriately. i think this will work better for ASCII than for UTF16 because we can better guess about embedded NULL bytes.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@williballenthin williballenthin added the bug Something isn't working label Jan 9, 2023
@williballenthin williballenthin added this to the 5.0.0 milestone Jan 9, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

Comment on lines +285 to +288
if b"\x00" in buf:
# account for bug #1271.
# remove when vivisect is fixed.
buf = buf.partition(b"\x00")[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should really never be the case, but viv is broken

Comment on lines +308 to +310
# partition to account for bug #1271.
# remove when vivisect is fixed.
return read_memory(vw, offset, ulen).decode("utf-16").partition("\x00")[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not really able to guess at the correct string length. this has an edge case where we have UTF16 | NULL | junk and when we try to decode as UTF16 then it fails due to the junk. we'll fail to extract some valid strings in this case, which is unfortunate.

@@ -627,6 +629,8 @@ def parametrize(params, values, **kwargs):
("mimikatz", "function=0x40105D", capa.features.common.String("ACR > "), True),
("mimikatz", "function=0x40105D", capa.features.common.String("nope"), False),
("773290...", "function=0x140001140", capa.features.common.String(r"%s:\\OfficePackagesForWDAG"), True),
# overlapping string, see #1271
("294b8d...", "function=0x404970,bb=0x404970,insn=0x40499F", capa.features.common.String("\r\n"), True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we expect this string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. the sample contains the string "\r\n\r\n\x00HTTP" and this string is fetched using &string[2]. this test demonstrates the substring handling works as expected (and no embedded NULL bytes).

capa supports whitespace in strings, including newlines. filtering that out would be another discussion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

  1. In capa/features/extractors/strings.py we do not extract \r\n so this seems inconsistent between file and instruction scope?
  2. We have a lint to warn on strings with length smaller than 4, "capa only extracts strings with length >= 4"?

Should we make this uniform across the extractors or keep as is (since we have more knowledge at the instruction scope, for example)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, great points!

lets create two new issues, one for each of those points.

this is still the only test case i know of for this issue, but maybe we can find another that more clearly shows the problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test here instead that the wrong string is not extracted?

@github-actions github-actions bot dismissed their stale review January 11, 2023 12:46

CHANGELOG updated or no update needed, thanks! 😄

@mr-tz
Copy link
Collaborator

mr-tz commented Jan 11, 2023

ca9c940 closes #1278

@mr-tz
Copy link
Collaborator

mr-tz commented Jan 11, 2023

@williballenthin, please review

Copy link
Collaborator Author

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me (cant review cause im an author)

@mr-tz mr-tz merged commit 5513d4c into master Jan 19, 2023
@mr-tz mr-tz deleted the fix/issue-1271 branch January 19, 2023 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

better extract overlapping strings
2 participants