Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve trend unquarantine performance #2420

Conversation

josh-feather
Copy link
Contributor

@josh-feather josh-feather commented Dec 3, 2024

Prior to these changes trend_unquarantine was passing the entire sample through a XOR routine before checking the XOR'd file header to determine if it was a quarantined file.

For a 100MB sample it was taking ~22 seconds to XOR the file.

%Own   %Total  OwnTime  TotalTime  Function
0.00%   0.00%   21.58s    21.58s   bytearray_xor (lib/cuckoo/common/quarantine.py)

Note, `bytearray_xor` is called by `trend_unquarantine`

By changing the function to only XOR the first 10 bytes of the file (header) before checking the file signature, we remove significant amounts of processing time for what will likely be all samples submitted.

Using the same sample, post-change, the whole unquarantine process now only takes 0.770s.

%Own   %Total  OwnTime  TotalTime  Function
0.00%   0.00%   0.040s    0.770s   unquarantine (lib/cuckoo/common/quarantine.py)

Note, `trend_unquarantine` is called by `unquarantine`

Ultimately, I'm not convinced that trend_unquarantine works anyway because:

  • there were syntax errors in the args passed to the .decode functions.
  • read_trend_tag returns a tuple[int, bytes], the tag code and the tag data. When unpacking the original filename tag (tag code 2), it passes the tag data (bytes) into the str func, encodes it and then decodes it again. This means the original filename is only ever a string representation of bytes (see below), causing the downstream functions to fail (in my testing, without a proper quarantined file).
>>> str(b"hello").encode("utf16").decode(errors="ignore")
"b\x00'\x00h\x00e\x00l\x00l\x00o\x00'\x00"  
>>> str("hello".encode("utf16")).encode("utf16").decode(errors="ignore")
"b\x00'\x00\\\x00x\x00f\x00f\x00\\\x00x\x00f\x00e\x00h\x00\\\x00x\x000\x000\x00e\x00\\\x00x\x000\x000\x00l\x00\\\x00x\x000\x000\x00l\x00\\\x00x\x000\x000\x00o\x00\\\x00x\x000\x000\x00'\x00"

Prior to these changes `trend_unquarantine` was passing the entire sample through a XOR routine before checking the XOR'd file header to determine if it was a quarantined file.

For a 100MB sample it was taking ~22 seconds to XOR the file.

```
%Own   %Total  OwnTime  TotalTime  Function
0.00%   0.00%   21.58s    21.58s   bytearray_xor (lib/cuckoo/common/quarantine.py)
```

By changing the function to only XOR the first 10 bytes of the file (header) before checking the file signature, we remove significant amounts of processing time for what will be almost all samples being submitted.

Using the same sample, post-change, the whole `unquarantine` process now only takes 0.770s.

```
%Own   %Total  OwnTime  TotalTime  Function
0.00%   0.00%   0.040s    0.770s   unquarantine (lib/cuckoo/common/quarantine.py)
```

Note, `unquarantine` is the parent function of `trend_unquarantine`, which calls `bytearray_xor`.
@doomedraven doomedraven merged commit 5035457 into kevoreilly:master Dec 3, 2024
3 checks passed
@doomedraven
Copy link
Collaborator

nice finding, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants