-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Dangerzone's Potential as a Redaction Tool (and add redaction capabilities) #763
Comments
If the previewer ends up using PDFs rather ran images, we can apparently use fitz for that (linked issue would not affect us if the doc was already rasterized once). |
Could dangerzone convert the text in the pdf to a .txt file which the journalist could redact manually? Things like black boxes still give away the length of the word being redacted. Then, could a tool be used to convert the redacted text into a pdf document with a template that could be standardized across the industry as a "redacted anti-watermark whistleblowing" template? That way, all watermarks could be removed, except if the corporation or government modifies the text itself a little bit depending on which authorized user is reading it. With corporations already putting invisible watermarks or whatever into their emails, the above idea could help protect sources. One issue is that with the document modified so much, the corporation or government could deny that it is a legitimate document and claim that it is faked. A second issue is, as mentioned, that they could adapt by modifying the words used in the document depending on the authorized viewer. Third, the leaked material might have important images or diagrams that need to be part of the document but which contain undetectable watermarks. And a fourth issue is that readers / viewers of the general public may be too ignorant to understand why these sorts of measures are necessary, causing them to doubt the authenticity of the document or be manipulated by propaganda. So idk. A tool like the one I presented in the first paragraph might still be useful though. |
To the above reservations, I'd add the fact that some documents may have two columns of text, pictures, or formatting elements like tables. If it's a solution that works for 90% of the documents, then we will add some extra mental load to a journalist that is already pressed (given that they are handling a very sensitive document). Still, allowing users to get back just the text of the document, and then post-process it in anyway they like, could be a nice fit for a Dangerzone plugin system. I think we had an issue for this, but I can't find it right now. |
Dangerzone's goal is protecting the user against malware. However, thought the way it works, it also removes metadata. So it can also help with publication security.
The problem
Typical PDFs manipulation tools have poorly implemented redaction methods that can be reversed. Because Dangerzone already rasterizes documents, it has nothing to loose. When a black box is applied and then rasterized, there is no more information in the final output.
This is best put in the paper Story Beyond the Eye: Glyph Positions Break PDF Text Redaction (emphasis added):
We're working on turning Dangerzone into a file view and that could be the perfect change to add redaction tools.
User Story
As a journalist, I'd like to have use dangerzone to help redact documents, ensuring that redactions cannot be reversed.
How could this work?
User journey:
Technical explanation: the host receives all the rasterized images. As the user adds a black box to the image, with the help of an image manipulation module (like Pillow) it adds those black boxes to the final image. If we want extra rasterization assurances, we can convert final PDF though dangerzone one more time to ensure proper rasterization.
Implementation Risks and Unmitigated Risks
We should keep in mind that redaction alone may not be to eliminate all unredaction risks. The best advice is never to publish source documents and if needed, to retype them. I can think of several other ways that redaction could still be bypassed:
The text was updated successfully, but these errors were encountered: