Command to re-send validation emails #689

AetherUnbound · 2022-05-10T00:08:19Z

Description

As a result of https://github.com/WordPress/openverse-api/releases/tag/v2.5.0, API token requests will now appropriately create validation emails. However, we need to perform this process for the existing applications.

@sarayourfriend has suggested a Django command that could be run on a production box which would send out the validation email to those who should have received it in the first place. There are a number of ThrottledApplications that folks made during testing that don't use legitimate emails. There are also plenty of duplicates where folks tried slightly different names (e.g. LietKynes, liet-kynes, liet_kynes, etc.). Per Sara:

We can take all the unique email addresses with unverified applications and take the application with the earliest creation date (we can safely assume that’s the one least likely to be a “dang it didn’t work, let me try something else”) then we can send the email with that token specifically and delete the rest of the tokens and applications associated with the email address.
Getting the list of verified email addresses would require joining across from the oauth2registration table to throttledapplication on the name column for both, filtering on verified = True from throttled application.

There are only 260 applications right now (257 of which are unverified), so we can likely do this all in one go rather than batching.

Additional context

Resolution

🙋 I would be interested in resolving this bug.

The text was updated successfully, but these errors were encountered:

dhruvkb · 2022-05-10T12:27:07Z

I think this issue can be broken down into two components:

a command to send the email for the given application IDs (this seems reusable as sometimes we might want to resend the emails)
a one-time script that we can run to find all the pending application IDs (this is the non-reusable part of the work that should not be a part of the codebase imo)

sarayourfriend · 2022-05-10T15:27:03Z

The issue with splitting them out @dhruvkb is then we'll have to also manage running the Django command with complex input.

If we want to run the command with different parameters for who to send the emails to in the future, we can just update the command.

The command also needs to clean up all but the last application and verification and everything and that part in particular is the most crucial part to test carefully in this as it will modify production data.

dhruvkb · 2022-05-10T15:33:16Z

I don't feel good about adding code to the repo for one-off activity, that will inevitably need to be modified before it can be reused/repurposed. But I'm not strongly against it. We can proceed with your suggestion if you feel justified to do so.

sarayourfriend · 2022-05-10T15:35:30Z

Why are you against adding such code? I've used django commands this way extensively in the past to be able to test scripts that run against and modify production data. Are there other ways to accomplish that?

I'd rather add it to the repository too because then the history is recorded in a public and searchable way. A script someone runs that isn't ever put into version control means we could lose it and have to redo all the work for it in the future if we run into the same problem.

With a one off command we can write it, test it, run it, and then delete it from the codebase.

dhruvkb · 2022-05-10T15:40:47Z

It's just that the script is not a part of the code, it is a manual correction of data. It's good to have it searchable and version controlled (which is why I don't oppose it strongly) but I don't think it should live with the code itself. Some places where it could be placed are:

a standalone scripts repository
a write-up of the email bug and the solutions taken to fix it (including this script attached)
a handbook page documenting the steps and including the script

I've seen these methods being used in other places but none of these really fit our situation 100%, which is why I reluctantly agreed with your idea to have it in the code 👍.

sarayourfriend · 2022-05-10T15:54:51Z

The problem with a standalone script is that it inevitably requires secrets management and a machine to run it on. This should not be the way production data is manipulated. If we have a django command then it's already wired up to the production data, it's unit tested (with existing infrastructure for carefully unit testing it like being able to use the Django ORM), and the way to run it is standardized and does not require inventing or adopting new technology or infrastructure.

It's just that the script is not a part of the code, it is a manual correction of data.

I think I just don't see this as being a problem, fundamentally. There's not really a separation to me of what is "code" and what is a one-off tool we're using to correct production data. Is the issue that someone using Openverse outside of our particular deployment context won't care about that code? Even that's not an issue to me... they can just ignore it (as they will other things that are particular to our canonical deployment (for example, in the future, several GitHub workflows for deploying and rolling back our services).

There are other things in the codebase that are particular and maybe "not part of the code" in that way:

openverse-api/api/catalog/settings.py

Lines 63 to 67 in 24d4b03

    
           SHORT_URL_WHITELIST = { 
        
               "api-dev.openverse.engineering", 
        
               "api.openverse.engineering", 
        
               "localhost:8000", 
        
           }

This is hyperspecific to our deployment.

The terms of service as well, that is also not part of the code nor generally usable by anyone else trying to deploy Openverse, but we leave it in the code base.

Sorry to harp on this, but I do want to be clear for future things like this so we know exactly how we plan to address these sorts of data remediation issues in a way that is organized, unit tested, able to be safely run in production with existing tools, documented, and version controlled.

AetherUnbound added 🐍 tech: python Requires familiarity with Python 💻 aspect: code Concerns the software code in the repository 🔧 tech: django Requires familiarity with Django 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP labels May 10, 2022

AetherUnbound assigned sarayourfriend May 10, 2022

sarayourfriend mentioned this issue May 11, 2022

Add command to resend verification emails #694

Merged

7 tasks

sarayourfriend closed this as completed in #694 May 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command to re-send validation emails #689

Command to re-send validation emails #689

AetherUnbound commented May 10, 2022

dhruvkb commented May 10, 2022

sarayourfriend commented May 10, 2022

dhruvkb commented May 10, 2022

sarayourfriend commented May 10, 2022

dhruvkb commented May 10, 2022 •

edited

Loading

sarayourfriend commented May 10, 2022

Command to re-send validation emails #689

Command to re-send validation emails #689

Comments

AetherUnbound commented May 10, 2022

Description

Additional context

Resolution

dhruvkb commented May 10, 2022

sarayourfriend commented May 10, 2022

dhruvkb commented May 10, 2022

sarayourfriend commented May 10, 2022

dhruvkb commented May 10, 2022 • edited Loading

sarayourfriend commented May 10, 2022

dhruvkb commented May 10, 2022 •

edited

Loading