-
Notifications
You must be signed in to change notification settings - Fork 54
Add DAG to report reported media pending review #513
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works great! And will be really useful for all of us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and is undoubtedly necessary for us to have.
However, what is actionable about this information? Do we have documentation about how to handle these reports or guidelines on expectations (like SLAs)?
Until we have those things I don't think they should go into the alerts channel, which should ideally only include actionable information. That can either mean not enabling the DAG until we've developed those processes or temporarily putting them into our notifications channel until they're actually actionable.
report_counts_by_media_type: dict containing report counts per media type, per | ||
report reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like something went wonky on the formatting for this comment.
report_counts_by_media_type: dict containing report counts per media type, per | |
report reason | |
report_counts_by_media_type: dict containing report counts per media type, per | |
report reason |
is this what it's meant to be, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did mean it to look like this -- generally when the description goes onto multiple lines, maintaining the alignment makes it more readable (for me at least). I'd say it's pretty necessary if there are multiple arguments (example), although maybe it looks strange when there's only one 🤷♀️
Co-authored-by: sarayourfriend <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! I can't wait!
openverse_catalog/dags/database/report_pending_reported_media.py
Outdated
Show resolved
Hide resolved
logger = logging.getLogger(__name__) | ||
|
||
DAG_ID = "report_pending_reported_media" | ||
DB_CONN_ID = os.getenv("OPENLEDGER_API_CONN_ID", "postgres_openledger_api") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I made an issue to tackle naming conventions RE: openledger #535
DB_CONN_ID = os.getenv("OPENLEDGER_API_CONN_ID", "postgres_openledger_api") | ||
|
||
REPORTS_TABLES = { | ||
"image": "nsfw_reports", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I made an issue to normalize these table names so we can generate this as f"nsfw_reports_{media_type}"
in the future WordPress/openverse-api#719
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 Thanks for creating an issue!
slack.send_message( | ||
"No records require review at this time :tada:", | ||
username="Reported Media Check-In", | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to prevent alert/notification fatigue, I think that it might be best not to send a slack message in this case 🙂 We can just emit a log message and exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it feels useful to have a weekly peace-of-mind status report regardless -- this just eliminates the situation where you feel like you haven't heard from Reported Media in awhile and must go manually check DAG logs. I do understand the concern about notification fatigue, but this would be once weekly in the crop of notifications that come in over the weekend.
I don't feel especially strongly about this though! It seems likely that these notifications may change as we update our processes for dealing with reported media at any rate 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair! You're right, once weekly isn't too frequent 😄
openverse_catalog/dags/database/report_pending_reported_media.py
Outdated
Show resolved
Hide resolved
slack.send_alert(message, username="Reported Media Requires Review") | ||
|
||
|
||
def create_dag(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This totally isn't necessary since it's already written, but small DAGs like this would be a perfect trial for the TaskFlow API!. I'm not sure how dynamic DAGs would work in it though... 🤔 maybe just a loop calling the airflow.decorators.dag
decorator.
These are really important questions and to my knowledge we don't have the answers right now. It's possible that depending on what processes we introduce, the alerts may even need to be directed to an entirely different channel eventually. For the moment I think there's value to at least gaining insight into the problem as a first step. |
Would it be distracting to redirect them into the notifications channel for the time being then? It'd be nice if things in the alerts channel were things that needed immediate attention and had clear actionability, like Sentry issue traiging or failed DAG runs (although I'm not sure if failed DAG runs have a clear action documented 🙂 if they do it'd be nice to link to the document describing what to do in the alerts for them). |
I see your point, but it doesn't feel quite right to me to move this to notifications. Reported media is a problem and does require attention, and while the processes/SLAs for acting on them aren't clear, just moving it out of the channel seems like it would increase the likelihood of it getting lost and unaddressed. The notifications channel is also much more busy than alerts. I think I may have the unpopular opinion here 😄 I can remove the notification for when no reports are found, and move the other one to the notifications channel. We would just need to be very deliberate about following up. Alternatively we can just wait on this PR until the process is solidified, since it's certain to change. |
This does make it sound like there's some action you're expecting to take when these come through 🙂 Even if the actions today are temporary "we're still figuring out what this means/documenting it/etc" it'd be nice to write that down and link to it from the notification so anyone is immediately able to either help progress that forward or know that they can ignore it and someone else who is keeping tabs on the particular project will come around to it after them. |
I personally feel like the particularities of the channels and response to these notifications is an implementation detail we can handle outside of the repository. To that end, I've started a dialog with WP Photo directory folks about potentially sharing moderators and moderator policies. Until then I can personally handle these reports weekly. |
I created WordPress/openverse#1541 to track writing a runbook for handling reported media, and adjusting Slack notifications as necessary. |
Fixes
Fixes WordPress/openverse#1659
Description
Adds a
report_pending_reported_images
DAG that runs weekly and for each media type:pending_review
status, meaning they require manual review through the Django AdminAdditional Notes
pending_review
associated with it, when you update any of these records all the other records associated to that image are updated as well. (This is why I think it's best to report distinct images requiring review)Testing Instructions
This requires a connection to the API DB (rather than the Catalog), so you'll need to create the connection env variables. You can use the values set in
env.template
in this PR. Make sure you also have the API running locally. To test the slack messages you can create the Airflow variableslack_message_override
and set it totrue
.Run the
report_pending_reported_media
DAG locally. If you're running sample data, there should be no reports and you should see theNo records to report
message.To create sample data to test the report count, you can report some images and audio (locally!) through the API. I went to
localhost:8000/v1/images
and grabbed theid
for a couple of the listed images. For each I then navigated tolocalhost:8000/v1/<id>/report
and POSTed a report. You can verify that these went through by checking outhttp://localhost:8000/admin/api/imagereport/?
Likewise for Audio. Report a mix of different reasons (some 'mature', some 'other', etc).Run the DAG again and make sure the correct numbers are reported. I also played around with manually reviewing some of the media reports (updating their status to something other than
pending_review
in the Admin) and verifying they were no longer included in the count; and also reporting the same records multiple times and verifying that it is only counted once in the output.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin