[Feature] Use metadata keywords to help detect if something is NSFW #482

aldenstpage · 2020-04-21T17:24:09Z

Problem Description

We are trying to make NSFW content in CC Search "opt-in". We can catch a lot of NSFW content by using API specific filters and relying on moderation "upstream" at the source, but sometimes things slip through.

Solution Description

One way we can help prevent this is scanning for NSFW profanity and slurs in the title/tags/artist name and settings nsfw = True in the metadata field if it fails the check. There are 3rd party lists of dirty words that can help us achieve this. In my experience moderating content on CC Search, this will help prevent a lot of embarrassment and indignant emails from teachers.

We can do a one-time scan-and-filter relatively easily, but we will also need a way to filter new content as it is ingested.

Additional Context

The Scunthorpe Problem

The text was updated successfully, but these errors were encountered:

aldenstpage · 2020-04-21T18:05:17Z

Also: we're going to need to review the list of words carefully, because the lists that I linked to are too broad in what they consider NSFW and could have some unwanted inadvertent censorship effects.

brenoferreira · 2020-04-21T19:29:43Z

One thing to watch out for in this word list is the potential for false positives that can end up filtering out a lot of content with words that aren't necessarily NSFW.

Edit: when I commented I had the tab open for a while so @aldenstpage comment hadn't loaded yet :D

kss682 · 2020-04-22T05:42:07Z

For the new content we could have a validator method in ImageStore class that checks against title,author and relevant attributes before inserting into tsv , so that the NSFW contents could be flaged and segregated at an early stage. @aldenstpage

aldenstpage self-assigned this Apr 23, 2020

aldenstpage transferred this issue from cc-archive/cccatalog Apr 23, 2020

aldenstpage mentioned this issue May 4, 2020

NSFW filter cc-archive/cccatalog-frontend#901

Merged

cc-archive deleted a comment from Mr-burme Jul 23, 2020

kgodey added not ready for work labels Aug 13, 2020

kgodey added 🚧 status: blocked Blocked & therefore, not ready for work 🧹 status: ticket work required Needs more details before it can be worked on and removed not ready for work labels Sep 24, 2020

cc-open-source-bot added the 🏷 status: label work required Needs proper labelling before it can be worked on label Dec 2, 2020

kgodey added 🙅 status: discontinued Not suitable for work as repo is in maintenance and removed 🚧 status: blocked Blocked & therefore, not ready for work 🧹 status: ticket work required Needs more details before it can be worked on labels Dec 16, 2020

kgodey closed this as completed Dec 16, 2020

obulat mentioned this issue Feb 22, 2023

[Feature] Use metadata keywords to help detect if something is NSFW (original #482) WordPress/openverse#750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Use metadata keywords to help detect if something is NSFW #482

[Feature] Use metadata keywords to help detect if something is NSFW #482

aldenstpage commented Apr 21, 2020

aldenstpage commented Apr 21, 2020 •

edited

Loading

brenoferreira commented Apr 21, 2020 •

edited

Loading

kss682 commented Apr 22, 2020

[Feature] Use metadata keywords to help detect if something is NSFW #482

[Feature] Use metadata keywords to help detect if something is NSFW #482

Comments

aldenstpage commented Apr 21, 2020

Problem Description

Solution Description

Additional Context

aldenstpage commented Apr 21, 2020 • edited Loading

brenoferreira commented Apr 21, 2020 • edited Loading

kss682 commented Apr 22, 2020

aldenstpage commented Apr 21, 2020 •

edited

Loading

brenoferreira commented Apr 21, 2020 •

edited

Loading