Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add SyncMetadataSourcePlugin #3393

Closed

Conversation

rhlahuja
Copy link
Contributor

@rhlahuja rhlahuja commented Oct 7, 2019

This code isn't working yet, but gets across the idea of what I'm going after -- simplify the creation of "sync classes" to sync API metadata from plugins that need not be MetadataSourcePlugin subclasses, but which must implement:

  • data_source: str and id_regex: dict attributes
  • track_for_id and album_for_id functions

This class can then be leveraged to easily create plugins for syncing metadata from Discogs, Spotify, Deezer, and Beatport.

Right now, I'm running into the following stacktrace because plugins.find_plugins() is trying to instantiate SyncMetadataSourcePlugin() on its own, when it's only intended to be instantiated in subclasses with arguments (see subclass implementations). I haven't dug into this much, but suspect there's a better way to implement this, perhaps using metaclasses.

I plan to dig into this more later, but wanted to throw it out there in case @sampsyo or others have suggestions of better ways to structure this.

Traceback (most recent call last):
  File "/Users/rahulahuja/Dropbox/.virtualenvs/rhlahuja/beets/bin/beet", line 11, in <module>
    load_entry_point('beets', 'console_scripts', 'beet')()
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/ui/__init__.py", line 1267, in main
    _raw_main(args)
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/ui/__init__.py", line 1250, in _raw_main
    subcommands, plugins, lib = _setup(options, lib)
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/ui/__init__.py", line 1136, in _setup
    plugins = _load_plugins(config)
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/ui/__init__.py", line 1122, in _load_plugins
    plugins.send("pluginload")
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/plugins.py", line 490, in send
    for handler in event_handlers()[event]:
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/plugins.py", line 473, in event_handlers
    for plugin in find_plugins():
  File "/Users/rahulahuja/git/rhlahuja/beets/beets/plugins.py", line 309, in find_plugins
    _instances[cls] = cls()
TypeError: __init__() missing 2 required positional arguments: 'command_name' and 'metadata_source_class'

@sampsyo
Copy link
Member

sampsyo commented Oct 7, 2019

Awesome; thanks for getting this started!

The problem you're seeing with that error is that beets uses some crazy magic to look for subclasses of BeetsPlugin and load them. There are two ways around this:

  • The clean way: make it a mixin. Instead of inheriting from BeetsPlugin, inherit from object. And make concrete plugin objects explicitly call the methods on the mixin. Instead of directly implementing commands, make a method called sync_command or something and require plugin classes to call that from their commands method.
  • The convenient way: avoid ever importing SyncMetadataSourcePlugin at the top level of a plugin module, as in from beets.plugins import SyncMetadataSourcePlugin. Then the beets plugin loader won't get confused and try to interpret the base class as a proper plugin in itself.

Anyway, this is looking cool so far! It makes me wonder whether the bpsync functionality shouldn't be part of the main Beatport plugin, if most of the logic is centralized…?

@rhlahuja
Copy link
Contributor Author

rhlahuja commented Oct 7, 2019

Thanks for the quick response as always @sampsyo, exactly what I was looking for! Option 1 seems to be working well, although while testing this with Discogs I realized this PR still needs some more work to match the correct release/tracks. I'll try finishing this off later this week or next.

And I agree, the sync plugins for Beatport/Discogs/Spotify/Deezer can be moved into their respective metadata plugin modules, since most of the logic is indeed abstracted away.

Edit: I think you meant integrate the sync functionality into the main plugin classes themselves, that makes more sense!

@rhlahuja
Copy link
Contributor Author

rhlahuja commented Oct 7, 2019

@sampsyo one other question whenever you're free: in the plugins I've recently added, I've been storing album id's and track id's in the mb_albumid and mb_trackid fields, although after reading discussion at #604, it seems that this is discouraged?

Should we instead be adding source-specific fields such as spotify_albumid, spotify_trackid, deezer_albumid, deezer_trackid?

@sampsyo
Copy link
Member

sampsyo commented Oct 8, 2019

That's a good question. #604 actually has a sort of unresolved architectural challenge… the autotagger matching logic relies on mb_albumid and mb_trackid being unique to, for example, avoid reporting duplicate matches. So without a deeper change, we need to keep "polluting" those fields with non-MB IDs, just to keep accidental collisions from appearing. The "right thing to do," of course, would be to change that logic so it doesn't assume that all matches have an MBID while still not sacrificing that deduplication logic.

In the mean time, however, I think it makes sense to forge ahead with the status quo and leave the IDs there. A good next step would be to also put them in special source-specific ID fields. Then, finally, we can imagine cleaning up the "pollution" problem originally reported in #604. But I think the best strategy is to do all of those in separate steps.

@rhlahuja
Copy link
Contributor Author

rhlahuja commented Oct 8, 2019

Ah gotcha, thanks for the context. I do agree that breaking out the source-specific fields is best done in separate steps.

But, I think I'm still missing something here. From what I've seen, it looks like all non-MusicBrainz autotagger plugins set the data_source attribute on their returned TrackInfo objects (not necessarily on the AlbumInfo object itself, but on each track in AlbumInfo.tracks). In fact, I'm currently relying on that that assumption here to identify library tracks from a particular metadata source.

If this holds true, shouldn't the autotagger rely on the combination/tuple of data_source and mb_trackid to be unique, instead of just mb_trackid?

@sampsyo
Copy link
Member

sampsyo commented Oct 8, 2019

Yes, good point! It should probably indeed do that.

@rhlahuja
Copy link
Contributor Author

rhlahuja commented Oct 8, 2019

Along those lines, I think it would also be a good idea to have all the non-MusicBrainz metadata plugins also set the AlbumInfo.data_source attribute in addition to TrackInfo.data_source.

I'm not sure if we have a straightforward way to run UPDATEs on users' databases, but if we did, to go even further we could even "backfill" AlbumInfo.data_source by setting it to the TrackInfo.data_source value (if it's the same for all the album's tracks). This would then allow the autotagger to also use the combination/tuple of mb_albumid and data_source for testing uniqueness.

@sampsyo
Copy link
Member

sampsyo commented Oct 9, 2019

We actually recently added that in #3350!

I think we can do without backfilling the database—the uniqueness is actually only used to distinguish between match objects, not to align with the database itself. That is, it's possible for the MusicBrainz backend to produce multiple AlbumInfo objects for the same release; we filter those out.

@pprkut
Copy link
Contributor

pprkut commented Oct 17, 2019

Side note, not sure it's really related, but probably good to check on the metasync plugin regardless, just in case.
I think the intention of this, correct me if I'm wrong, would be sources for initial tagging on import (like discogs, amazon, musicbrainz, etc) rather than other metadata sources (like itunes, kodi, lastfm, etc), but naming the classes SyncMetadataSourcePlugin or MetadataSourcePlugin might be confusing then

@jtpavlock
Copy link
Contributor

@rhlahuja are you still interested in continuing this?

@rhlahuja
Copy link
Contributor Author

@jtpavlock I am, but unfortunately don't have the time to commit right now. Are you interested in picking it up?

@jtpavlock
Copy link
Contributor

I'm not, I was checking in on the status. I'll just mark this as a draft for now, but feel free to keep working on it at your own pace 😁

@jtpavlock jtpavlock marked this pull request as draft July 14, 2020 21:56
@stale
Copy link

stale bot commented Nov 18, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 18, 2020
@stale stale bot closed this Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants