Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flag divergent metadata to try and avoid creating inconsistencies in the library #161

Merged
merged 8 commits into from
Jan 19, 2020

Conversation

homsar
Copy link
Contributor

@homsar homsar commented Dec 30, 2019

This responds to a request from @euricaeris on Slack for a way to flag tracks on import when they contain metadata that doesn't match something already in the library but almost does, to avoid misgrouping artists and anime.

A couple of supporting changes were needed to do this neatly—pulling out the functions supporting all-the-anime and all-the-artists into the Track model, and changing role_detail such that if a track title has brackets that look like a role but don't actually match a regex, then the call to role_detail() doesn't raise AttributeError.

I'd have preferred not to have to pass all_anime_titles and all_artists in to metadata_consistency_checks, but couldn't find a neat way of caching it that would be invalidated once update_library returned.

No automated tests, but I tested manually with a songlibrary.xml from a few months ago and ironed out the bugs (and it seems to do what was intended.) There are now unit tests for (I think) all new functionality and some of the existing update_library functionality and all pass.

@homsar
Copy link
Contributor Author

homsar commented Jan 5, 2020

realised I didn't mention this in the commit message or PR—this check is done on import to the library; it uses the warning functionality added in 0749fad, and generates warnings if any of the artists or the anime title is either missing, or both not in the database and also has a Levenshtein closeness of above 0.7 to another artist/anime already in the library (after converting all to lowercase). in the latter case the warning suggests the anime with the highest Levenshtein closeness, for ease of correcting the metadata.

this does make an import somewhat more expensive on the server side, since it now requires generating the full list of artists and anime titles. there is no impact on performance of the rest of the site, however, and all other views remain unchanged (e.g. the track display does not flag up any similar anime).

a possible way to improve performance would be to cache the result of calling all_anime_titles and all_artists, and then manually invalidate that cache from any functions that update the library, and queue a background task to call that function to regenerate the cache. certainly if we wanted to use those functions for anything else (e.g. highlighting similar anime titles on an anime page) then something like that would be necessary to avoid massive load and long load times for listeners.

@homsar
Copy link
Contributor Author

homsar commented Jan 11, 2020

this now has a test suite

since the tests ended up being quite long, I moved the existing tests into a directory so I could use a separate file for the update_library-specific tests. in developing the test suite, I found a bug in the presentation of artists which I squashed, and also found some missing desired functionality (that artists with reversed names are flagged up), which I added.

something that could potentially be added if desired is flagging tracks being added that do not have a Composer.

@colons colons merged commit 82e3f72 into very-scary-scenario:production Jan 19, 2020
@colons
Copy link
Member

colons commented Jan 19, 2020

the server has plenty of spare compute time, this is a very good use of it

i've removed a trailing 'which is' from the warning string, and i was unable to make the non-canonical name order thing trigger (both pictured below), am i missing something?

warnings

@homsar
Copy link
Contributor Author

homsar commented Jan 19, 2020

The "which is" carried an implied "[in the library]", but that was very non-obvious because the tense disagrees with the start of the sentence. The new wording is more concise while conveying the same information. A thought that now occurs to me is that in principle there could potentially be a link to the artist page for that artist at that point.

Artists are checked individually as parsed by Track.artists. Currently (see #108) Oranges and Lemons are two separate artists, which is why there's no warning in that example.

@colons
Copy link
Member

colons commented Jan 19, 2020

ah, that makes sense; @theshillito would a link to the artist page be useful, or are you just gonna dive back into itunes at that point?

i read 'flag up artists written in non-canonical order' as meaning that lists of multiple artists (like Oranges and Lemons) should be consistent; perhaps it was a bad idea to pick the canonical case where that is incorrect as the example

@homsar
Copy link
Contributor Author

homsar commented Jan 19, 2020

Ah I see; no, the intent was to flag when individual humans had their names written in a different way to how it was already in the library (e.g. a track uploaded as Kana Hanazawa when the library is full of Hanazawa Kana).

I don't think I've noticed any problems with long artist lists being shuffled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants