Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Command to re-send validation emails #689

Closed
1 task
AetherUnbound opened this issue May 10, 2022 · 6 comments · Fixed by #694
Closed
1 task

Command to re-send validation emails #689

AetherUnbound opened this issue May 10, 2022 · 6 comments · Fixed by #694
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP 🔧 tech: django Requires familiarity with Django 🐍 tech: python Requires familiarity with Python

Comments

@AetherUnbound
Copy link
Contributor

Description

As a result of https://github.com/WordPress/openverse-api/releases/tag/v2.5.0, API token requests will now appropriately create validation emails. However, we need to perform this process for the existing applications.

@sarayourfriend has suggested a Django command that could be run on a production box which would send out the validation email to those who should have received it in the first place. There are a number of ThrottledApplications that folks made during testing that don't use legitimate emails. There are also plenty of duplicates where folks tried slightly different names (e.g. LietKynes, liet-kynes, liet_kynes, etc.). Per Sara:

We can take all the unique email addresses with unverified applications and take the application with the earliest creation date (we can safely assume that’s the one least likely to be a “dang it didn’t work, let me try something else”) then we can send the email with that token specifically and delete the rest of the tokens and applications associated with the email address.
Getting the list of verified email addresses would require joining across from the oauth2registration table to throttledapplication on the name column for both, filtering on verified = True from throttled application.

There are only 260 applications right now (257 of which are unverified), so we can likely do this all in one go rather than batching.

Additional context

Resolution

  • 🙋 I would be interested in resolving this bug.
@AetherUnbound AetherUnbound added 🐍 tech: python Requires familiarity with Python 💻 aspect: code Concerns the software code in the repository 🔧 tech: django Requires familiarity with Django 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP labels May 10, 2022
@dhruvkb
Copy link
Member

dhruvkb commented May 10, 2022

I think this issue can be broken down into two components:

  • a command to send the email for the given application IDs (this seems reusable as sometimes we might want to resend the emails)
  • a one-time script that we can run to find all the pending application IDs (this is the non-reusable part of the work that should not be a part of the codebase imo)

@sarayourfriend
Copy link
Contributor

The issue with splitting them out @dhruvkb is then we'll have to also manage running the Django command with complex input.

If we want to run the command with different parameters for who to send the emails to in the future, we can just update the command.

The command also needs to clean up all but the last application and verification and everything and that part in particular is the most crucial part to test carefully in this as it will modify production data.

@dhruvkb
Copy link
Member

dhruvkb commented May 10, 2022

I don't feel good about adding code to the repo for one-off activity, that will inevitably need to be modified before it can be reused/repurposed. But I'm not strongly against it. We can proceed with your suggestion if you feel justified to do so.

@sarayourfriend
Copy link
Contributor

Why are you against adding such code? I've used django commands this way extensively in the past to be able to test scripts that run against and modify production data. Are there other ways to accomplish that?

I'd rather add it to the repository too because then the history is recorded in a public and searchable way. A script someone runs that isn't ever put into version control means we could lose it and have to redo all the work for it in the future if we run into the same problem.

With a one off command we can write it, test it, run it, and then delete it from the codebase.

@dhruvkb
Copy link
Member

dhruvkb commented May 10, 2022

It's just that the script is not a part of the code, it is a manual correction of data. It's good to have it searchable and version controlled (which is why I don't oppose it strongly) but I don't think it should live with the code itself. Some places where it could be placed are:

  • a standalone scripts repository
  • a write-up of the email bug and the solutions taken to fix it (including this script attached)
  • a handbook page documenting the steps and including the script

I've seen these methods being used in other places but none of these really fit our situation 100%, which is why I reluctantly agreed with your idea to have it in the code 👍.

@sarayourfriend
Copy link
Contributor

The problem with a standalone script is that it inevitably requires secrets management and a machine to run it on. This should not be the way production data is manipulated. If we have a django command then it's already wired up to the production data, it's unit tested (with existing infrastructure for carefully unit testing it like being able to use the Django ORM), and the way to run it is standardized and does not require inventing or adopting new technology or infrastructure.

It's just that the script is not a part of the code, it is a manual correction of data.

I think I just don't see this as being a problem, fundamentally. There's not really a separation to me of what is "code" and what is a one-off tool we're using to correct production data. Is the issue that someone using Openverse outside of our particular deployment context won't care about that code? Even that's not an issue to me... they can just ignore it (as they will other things that are particular to our canonical deployment (for example, in the future, several GitHub workflows for deploying and rolling back our services).

There are other things in the codebase that are particular and maybe "not part of the code" in that way:

SHORT_URL_WHITELIST = {
"api-dev.openverse.engineering",
"api.openverse.engineering",
"localhost:8000",
}

This is hyperspecific to our deployment.

The terms of service as well, that is also not part of the code nor generally usable by anyone else trying to deploy Openverse, but we leave it in the code base.

Sorry to harp on this, but I do want to be clear for future things like this so we know exactly how we plan to address these sorts of data remediation issues in a way that is organized, unit tested, able to be safely run in production with existing tools, documented, and version controlled.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP 🔧 tech: django Requires familiarity with Django 🐍 tech: python Requires familiarity with Python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants