-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible extended ASCII string decompress problem. #490
Comments
In this sample, the VBA string with special characters seems to be
But that exception is hidden by olevba because it uses errors='replace' in VBA_Project.decode_bytes:
And this is why the the UTF-8 encoded output is incorrect. On Wikipedia about CP1252: "According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too." |
TODO:
|
There is an additional weird wrinkle to the extended ASCII characters. You have tried copying and pasting from the VBA editor, now try adding a loop to Debug.Print each character in the string with Mid(), copy/paste the debug text, and look at the byte values in that text. In this case the original 128...256 byte value (single byte) is used for each of the extended ASCII characters. So it looks like Office uses unicode for display in the VBA editor but under the covers it is still using the single byte extended ASCII values when accessed in VBA (this is also the behavior I see with VBA string decode loops). Maybe there can be an olevba option for display text values vs. raw/underlying text values? |
Affected tool:
olevba
Describe the bug
It looks like olevba may be improperly decompressing the values of some VBA strings that contain extended ASCII characters. There are some different extended ASCII VBA characters that result in the same byte sequence in the output of olevba.
File/Malware sample to reproduce the bug
An example Word document is available at https://github.com/kirk-sayre-work/talks/blob/master/test.docm
How To Reproduce the bug
Compare the output of olevba on the file with the output of oledump.py test.docm -s A3 -v . The string contents are different between the 2 tools, with the output of oledump.py for the string appearing to be possibly correct.
Version information:
Additional context
There are some maldoc campaigns (currently IcedID) that are encoding payloads in strings with extended ASCII characters. Vipermonkey fails to properly decode the payloads due to what appear to be issues with the decompression of the extended ASCII strings.
The text was updated successfully, but these errors were encountered: