-
-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Censor rules are sometimes ineffective #1156
Comments
I've struggled with censor rules at: https://www.whatdotheyknow.com/admin/requests/33740 Ideally rule 1724 would have worked; it didn't. I needed different censor rules for the quoted response to the original. In relation to part of that I used the censor rule to note the change and edited the outgoing message. |
Is it easy to determine if a rule has had any effect? (Presumably if anything was matched by what the rule was to apply to it would have an effect?) Could we give an "are you sure" warning when an attempt is made to create a rule which has no effect? |
This is an issue which I'm encountering every couple of days. Today the first censor rule I applied at: https://www.whatdotheyknow.com/admin/requests/109902 covered a paragraph and did work to remove it from the original outgoing message however it didn't work for a copy in a further outgoing message or when it was quoted in replies. I understand the line breaking is different in those latter occurrences. I couldn't manage to create rules which would apply to those latter cases. In one case I resorted to editing the raw outgoing message and the other I applied line by line censor rules. |
Censor rule 1761 at https://www.whatdotheyknow.com/admin/requests/155205 didn't work. I removed the text from the outgoing message instead (leaving the rule in place as a record) To remove the material where it was quoted in a reply I tried copy and pasting text as it appeared in that reply (rule 1762) that didn't work either. I created line by line censor rules which did work. |
Another request where a multi-line censor rule didn't work and I spent a while doing it line by line: https://www.whatdotheyknow.com/request/disabled_and_elderly_respite_car I also had problems with censoring punctuation like "-" but I may just not have tried hard enough. |
More examples where multi-line censor rules would have been useful can be seen at: https://www.whatdotheyknow.com/admin/requests/158698 (again there I edited the raw outgoing messages after using the [ineffective] censor rules to note the original content). Maybe we need another, additional, way of thinking about this censoring eg. removing from point x to point y in the relevant message, as an option as well as trying to match and replace specific text? |
Another example where I resorted to line by line censor rules for the incoming message and editing the outgoing |
I've edited the title of this issue to make it more general. At Often removal of material from PDF documents is hard see for example: https://www.whatdotheyknow.com/admin/requests/228030 See also the EditingRequests page under WhatDoTheyKnow on the mySociety old internal wiki. |
At https://www.whatdotheyknow.com/admin/requests/272538 We want to redact a number from an Excel .xlsx document. The censor rules aren't working on it for a reason presumably related to the structure of the file. The number is there in plain text is there once the .xlsx file is opened/extracted to reveal the plain text files within. |
I used lots of censor rules at: an attempt at a multi-line rule failed; I didn't attempt things like replacing line breaks with spaces or other modifications to try and make it work. |
Another example where lots of line by line censor rules were (I think) required |
Discussion of better documenting the regex option already present in Alaveteli: |
Just adding a +1 following a case on WhatDoTheyKnow where having censor rules work on a .xlsx document would have been useful. See previous comment on this thread: #1156 (comment) |
Adding a note as today I wanted to invoke a censor rule on a docx document at https://www.whatdotheyknow.com/admin/requests/489665 the text to redact is present in plain text in the docx document, but it's within directories in the docx structure. Currently censor rules don't work on such documents. |
In some cases censor rules applied to PDFs corrupt the output PDF. In a case I was investigating recently, it looks like re-compressing the PDF results in it being created with a different PDF version. Presumably something else in the PDF is incompatible with the new version, so breaks in some viewers:
Even adding a censor rule that does nothing results in different (corrupt) output:
|
Noting a discussion group thread on which it is stated
works to match newlines in Alaveteli |
Surprisingly an attempted censor rule on WhatDoTheyKnow
on
didn't work apparently as it because it went across two lines of text as wrapped for display on the public page. This is confusing as there were no line breaks shown when viewing
Censor rules for just one line of text do work. This one might be worth looking at to seek to understand the issues with line breaks and censor rules. |
I had challenges today with plain text censor rules on plain text emails. I was focused on getting the job done rather than identifying the issues but I think issues were caused by:
|
Related - being able to stop displaying "show quoted sections" would help with censor rule creation significantly, it reduces the amount of material one needs to censor. |
I had a case where I wanted to redact
to
to remove material from a PDF Similar rules with different characters in the brackets eg.
to
did work |
This issue has been automatically closed due to a lack of discussion or resolution for over 12 months. |
This was raised, and discussed, at the now closed #33
There is still an issue with long censor rules not always working. I suspect it's an issue related to line breaks or odd characters.
A recent example is at:
https://www.whatdotheyknow.com/admin/requests/173631
I resorted to editing the outgoing message there as censor rule attempts were ineffective. I have left the ineffective censor rules in place, in part as a record of what has been removed.
Simpler rules which don't work can be seen in censor rules I've applied to a test request at:
https://www.whatdotheyknow.com/admin/requests/10375
One of those rules has a "this works" comment the others don't work even though I've made them by copying and pasting either from the raw message in the admin interface or from the public thread.
The text was updated successfully, but these errors were encountered: