-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ScanTranscode - Convert New/Uncommon Image Formats #324
Conversation
def scan(self, data, file, options, expire_at): | ||
output_format = options.get("output_format", "jpeg") | ||
|
||
def convert(im): | ||
with io.BytesIO() as f: | ||
im.save(f, format=f"{output_format}", quality=90) | ||
return f.getvalue() | ||
|
||
# Send extracted file back to Strelka | ||
self.emit_file(convert(Image.open(io.BytesIO(data))), name=file.name) | ||
|
||
self.flags.append("transcoded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we discussed global exception handling - curious how you want to approach adding exceptions here. My first thought would be something like...
def scan(self, data, file, options, expire_at):
output_format = options.get("output_format", "jpeg")
def convert(im):
with io.BytesIO() as f:
try:
im.save(f, format=f"{output_format}", quality=90)
return f.getvalue()
except ValueError:
self.flags.append(f"{self.__class__.__name__} Exception: Invalid format or quality.")
except OSError:
self.flags.append(f"{self.__class__.__name__} Exception: Unsupported format or invalid image file.")
except AttributeError:
self.flags.append(f"{self.__class__.__name__} Exception: Data is not a bytes-like object.")
except Exception as e:
self.flags.append(f"{self.__class__.__name__} Exception: {str(e)[:50]}")
# Send extracted file back to Strelka
try:
self.emit_file(convert(Image.open(io.BytesIO(data))), name=file.name)
except Exception as e:
self.flags.append(f"{self.__class__.__name__} Exception: Failed to emit file")
return
self.flags.append("transcoded")
Too much? Too specific to scanner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I catch UnidentifiedImageError
now, which is what's thrown when a broken image is loaded in to Image.open. Fair to do if we expect a lot of exceptions from badly formatted or truncated image files. I'd like to add in other specific exceptions as they come up while running Strelka.
Added a test for a broken image.
I'm not keen on a broader catch. If emit_file() itself fails, I think that should be handled inside of emit_file(). I added code to add a flag when emit_file fails, and log the exception. If emit_file fails, it's likely due to a coordinator connectivity problem, and we shouldn't suppress those exceptions.
Describe the change
HEIC/AVIF/HEIF images are appearing more frequently as platforms like mobile devices use these new encoding schemes to increase compression performance. Support for these codecs is missing in common tools and modules, like the
tesseract
dependency in Strelka for OCR.The transcode process is relatively fast, e.g. 0.023 sec for a ~100KB image.
Adds a scanner that can convert e.g. HEIC/AVIF/HEIF files into other formats. Default conversion is to jpeg with quality 90 (preserves OCR quality without inflating file sizes).
Adds tests for three new fixtures transcoded into each of the six available formats, one test for broken images.
Adds negative matches in ScanJpeg, ScanEofPng, ScanEofBmp, ScanNf, ScanLsb for ScanTranscode.
Improves exception handling in emit_file().
Describe testing procedures
============================= test session starts ==============================
platform linux -- Python 3.10.6, pytest-7.2.0, pluggy-1.0.0
rootdir: /strelka
plugins: mock-3.10.0, unordered-0.5.2
collected 121 items
tests/test_scan_tar.py .
tests/test_scan_tlsh.py .
tests/test_scan_transcode.py ...................
tests/test_scan_upx.py .
tests/test_scan_url.py ..
====================== 122 passed, 28 warnings in 45.33s =======================
Sample output
Sample transcode:
Same image without transcoding:
Checklist