Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backfillmoderationdecision management command #4415

Merged
merged 6 commits into from
Jun 11, 2024

Conversation

krysal
Copy link
Member

@krysal krysal commented May 31, 2024

Fixes

Fixes #3641 by @sarayourfriend

Description

This PR adds the command as described in the implementation plan section for Deprecating and removing report status. It takes as options the media type, the username of the moderator (in production it should be opener but in local we use deploy) and a flag for whether to run the command or just get the count of report to process.

Verify that the mapping is according to the specification.

Testing Instructions

To test manually, create some reports, mark them with a reason other than "pending" and run the command.

just dj backfillmoderationdecision [--dry-run | --no-dry-run] [--media-type {image,audio}] --moderator deploy

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (just catalog/generate-docs for catalog
    PRs) or the media properties generator (just catalog/generate-docs media-props
    for the catalog or just api/generate-docs for the API) where applicable.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@github-actions github-actions bot added the 🧱 stack: api Related to the Django API label May 31, 2024
@openverse-bot openverse-bot added 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels May 31, 2024
@krysal krysal removed the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label May 31, 2024
@krysal krysal force-pushed the add/ModerationDecision_backfill_cmd branch from cb2d606 to c6c6d4b Compare May 31, 2024 20:50
@krysal krysal marked this pull request as ready for review May 31, 2024 20:55
@krysal krysal requested a review from a team as a code owner May 31, 2024 20:55
@krysal krysal requested review from dhruvkb and stacimc May 31, 2024 20:55
Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment with just some fiddly ORM details, but if the number of non-pending reports in production is low enough, it probably won't even matter.

api/api/management/commands/backfillmoderationdecision.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests, for both Image and Audio reports with the status "mature_filtered" did not get their action set to "confirmed_sensitive" as I expected (both had an empty action) 🤔 Are you able to reproduce that, @krysal?

@krysal
Copy link
Member Author

krysal commented Jun 4, 2024

@stacimc I do get the "confirmed_sensitive" action for a "mature_filtered" report. What output for the command do you get?

@krysal krysal force-pushed the add/ModerationDecision_backfill_cmd branch from 3daf454 to 4d97b12 Compare June 4, 2024 17:38
@sarayourfriend
Copy link
Collaborator

@krysal I don't see any MediaDecisionThrough being created when I run this. The decisions are created, but not the through table records. I thought this would happen automatically, but it doesn't. What's more, it isn't possible to set it directly on the media, it needs to be handled through image.imagedecision_set on the media (or equivalent for audio), or on the decision with decision.media_objs.set(). But, we can also bulk create it anyway, if we have to do it manually, and this appears to work for me locally:

diff --git a/api/api/management/commands/backfillmoderationdecision.py b/api/api/management/commands/backfillmoderationdecision.py
index f88e40769..e9340604a 100644
--- a/api/api/management/commands/backfillmoderationdecision.py
+++ b/api/api/management/commands/backfillmoderationdecision.py
@@ -5,7 +5,7 @@ from django.contrib.auth import get_user_model
 from django_tqdm import BaseCommand
 
 from api.constants.moderation import DecisionAction
-from api.models import AudioDecision, AudioReport, ImageDecision, ImageReport
+from api.models import AudioDecision, AudioReport, ImageDecision, ImageReport, ImageDecisionThrough, AudioDecisionThrough
 from api.models.media import DMCA, MATURE_FILTERED, NO_ACTION, PENDING
 
 
@@ -43,9 +43,11 @@ class Command(BaseCommand):
 
         MediaReport = ImageReport
         MediaDecision = ImageDecision
+        MediaDecisionThrough = ImageDecisionThrough
         if media_type == "audio":
             MediaReport = AudioReport
             MediaDecision = AudioDecision
+            MediaDecisionThrough = AudioDecisionThrough
 
         non_pending_reports = MediaReport.objects.filter(decision=None).exclude(
             status=PENDING
@@ -83,6 +85,15 @@ class Command(BaseCommand):
             for report, decision in zip(reports_chunk, decisions):
                 report.decision = decision
             MediaReport.objects.bulk_update(reports_chunk, ["decision"])
+            MediaDecisionThrough.objects.bulk_create(
+                [
+                    MediaDecisionThrough(
+                        media_obj_id=report.media_obj_id,
+                        decision_id=report.decision_id,
+                    )
+                    for report in reports_chunk
+                ]
+            )
             t.update(1)
 
         t.info(

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the issue with the decision through table, which I've described in a comment. Otherwise this is working fine when I test it, I did not have the issue Staci saw and explicitly made sure a decision was created in that case. However, I wonder if Staci is seeing an issue in Django admin caused by the lack of the decision through? Not sure what the downstream effects are currently of a missing decision through row for a media object.

@dhruvkb
Copy link
Member

dhruvkb commented Jun 5, 2024

I'll hold off on reviewing this PR till you've had a chance to address @sarayourfriend's requests for changes.

The relationship between the media object and the decision can be created more succinctly using this

decision.media_objs.add(media_obj)

where decision is the ImageDecision/AudioDecision instance and media_obj is the Image/Audio instance respectively. There is code very similar to the one you've created in the media_report.py file (maybe there is scope to extract them both into a separate util, but that might be me prematurely DRY-ing)

decision = form.save(commit=False)
decision.moderator = request.user
decision.save()
logger.info(
"Decision created",
decision=decision.id,
action=decision.action,
notes=decision.notes,
moderator=request.user.get_username(),
)
decision.media_objs.add(media_obj)
logger.info(
"Media linked to decision",
decision=decision.id,
media_obj=media_obj.id,
)
reports = form.cleaned_data["reports"]
count = reports.update(decision=decision)
logger.info(
"Decision recorded in reports",
report_count=count,
decision=decision.id,
)

@sarayourfriend
Copy link
Collaborator

The problem with doing it through the decision is that you have to go one-by-one, there's no way to make it a bulk operation; hence why I suggested bulk_create within the existing bulk creation loop. The relationships here are always 1-to-1, because of the nature of the backfill.

In other cases, where a single decision is being created for multiple media objects, then yes, setting it on the decision (or media) instance is more straightforward and doesn't require manually creating the through table instances. In this case, though, we should use bulk create, in line with the rest of the code.

@krysal krysal marked this pull request as draft June 5, 2024 14:17
@krysal krysal force-pushed the add/ModerationDecision_backfill_cmd branch from 9ac7591 to 6358bd7 Compare June 5, 2024 21:22
@krysal krysal force-pushed the add/ModerationDecision_backfill_cmd branch from 6358bd7 to df449e4 Compare June 5, 2024 21:31
@krysal krysal requested a review from sarayourfriend June 5, 2024 21:46
@krysal
Copy link
Member Author

krysal commented Jun 5, 2024

@sarayourfriend Good catch! I think the bulk updates of MediaDecision skip the MediaDecisionThrough table/model. Thanks for the handy diff, the changes have been applied.

@krysal krysal marked this pull request as ready for review June 6, 2024 14:21
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! No longer seeing issues with confimed_sensitive. I checked all action types and double checked the relationships are all set correctly, looks great 👍

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@krysal krysal merged commit 0094a1d into main Jun 11, 2024
54 checks passed
@krysal krysal deleted the add/ModerationDecision_backfill_cmd branch June 11, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Create ModerationDecision backfill management command
5 participants