-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert longer media varchar
fields to text
in the API
#4315
Conversation
8bc9fe8
to
b0695a0
Compare
This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced. |
Full-stack documentation: https://docs.openverse.org/_preview/4315 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. Changed files 🔄: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we perform all our searching with Elasticsearch and still have the standard indices, I think it's totally fine to have them be removed (and may even speed up certain operations).
💯. This was my thought exactly when I read those changes in the migration.
LGTM. I've left one remark regarding the URL fields. As clarified there, it can be a fast follow or a separate issue, if you prefer to avoid bike shedding the decision here. In my opinion it isn't controversial to say we don't need write-time validators on models that are never written to using Django ORM, but I also respect that could be seen as a significant change (however much I don't see it to be, given the usage of the models). All of that to say, just want to reiterate it is not a blocker.
class URLTextField(models.TextField): | ||
"""URL field which uses the underlying Postgres TEXT column type.""" | ||
|
||
default_validators = [validators.URLValidator()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Up front context) I know we currently use a URL field for some of these (though this would be a new detail for creator_url
, I think), so this is only relevant feedback for this PR due to the need to add new code to continue supporting it. Just wanting to clarify up front I know this isn't a decision you're making about the model, and that my intention is to ask whether we need this at all, and if not, then to give feedback that this code is unnecessary and can be removed.
This validator is the only thing that differentiates the URL field from a regular text field. I question whether we need this. Not blocking because it's a trivial thing to remove in the future or as a fast-follow if we like (would be a no-SQL migration). I'd prefer we didn't add this code, however. These write-time Django validators are irrelevant for our domain and usage of the ORM because we never write to these tables with the Django ORM (except in tests). Data validation either has to happen in the catalogue, data refresh, or not at all, but definitely not in Django write-time validators.
The change requested would be to use a TextField
and forego write-time validators as a code-quality improvement and clarification of the intention of these models and the domain they're actually concerned with (i.e. reading).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point. I can't think of a use case right now, but I also think it doesn't hurt to leave the validation for this kind of field. I have no strong opinion on either side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very happy to remove this code as a fast-follow! I wanted to make as functionally minimal a change as possible here, even if it meant adding more code. I'll follow up with an issue and a PR later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AetherUnbound Just checking, were you able to create the issue?
@krysal I guess generally there is a rationale of removing code that isn't used. All code is a liability, either as a vulnerability or increased maintenance cost (the latter being most relevant and in fact exemplified here), so as much of it as we don't need is a good idea to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created in #4320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It's surprisingly nice how seamless the change is, and good to know the unique constraints are not affected.
# As with CharField, this will cause URL validation to be performed | ||
# twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Twice meaning it would happen in the form and at the database level?
class URLTextField(models.TextField): | ||
"""URL field which uses the underlying Postgres TEXT column type.""" | ||
|
||
default_validators = [validators.URLValidator()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point. I can't think of a use case right now, but I also think it doesn't hurt to leave the validation for this kind of field. I have no strong opinion on either side.
Fixes
Fixes #4311 by @AetherUnbound
Description
This PR converts the listed fields from an underlying
character varying
type to thetext
type. In some cases, URL validation was used for fields. I created a custom field type which mirrorsURLField
except that it inherits fromTextField
rather thanCharField
.Below is the SQL migration for this change. There are several indices that are dropped, but note that these are only the
_like
indices that optimize forvarchar_pattern_ops
. Since we perform all our searching with Elasticsearch and still have the standard indices, I think it's totally fine to have them be removed (and may even speed up certain operations).For instance, here's the indices on the
audio
table:Only the following indices would be removed:
audio_foreign_identifier_617f66ad_like
audio_url_b6a832d3_like
Each of these has a standard
btree
index which would remain after the operation.Testing Instructions
CI should pass, since this is essentially a semantic underlying database change.
Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin