Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost in the documentation - Get the current best model - classification head combination #1441

Open
csipapicsa opened this issue Oct 12, 2024 · 2 comments

Comments

@csipapicsa
Copy link

Hi,

I'm trying to identify the most accurate models for classifying (let's say) 'happy' moods in music. I've noticed some inconsistencies in the model listings from different years and I'm a bit confused about how to proceed.

From what I gathered, models listed in a 2020 post on the Essentia Labs website point to specific TensorFlow models for mood classification: 2020 TensorFlow Models Released - Essentia Labs

Additionally, I found a specific model for 'happy' mood classification detailed here: Mood Happy Classifier - Musicnn MSD

However, in a more recent listing from 2022 on the main Essentia models page, there seems to be an update or different models used: Essentia Models 2022 - Mood Happy

I also noticed that the same embedding model is used for different tasks, which is adding to my confusion:

embedding_model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")

Could someone clarify which models and classification heads are currently considered the most accurate for detecting like 'happy' moods in music? Any guidance on how to effectively select and use these models would be greatly appreciated. I want to use more models for my master thesis so any help would be helpful!

Thank you!

@palonso
Copy link
Contributor

palonso commented Oct 14, 2024

Hi @csipapicsa,
according to internal metrics, the happy classifier based on discgos-effnet embeddings achieved higher performance than the others. This is a code snippet to get predictions with this model:

from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs, TensorflowPredict2D

audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
embedding_model = TensorflowPredictEffnetDiscogs(graphFilename="discogs-effnet-bs64-1.pb", output="PartitionedCall:1")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="mood_happy-discogs-effnet-1.pb", output="model/Softmax")
predictions = model(embeddings)

Remember that to run this code, you must download the model files (*.pb) and set the graphFilename parameter accordingly.

Additionally, note that some of our auto-tagging classifiers (MTG-Jamendo, MSD, MTT) also predict the happy tag. You can experiment with these predictions and choose the most suitable for your use case.

Best,
Pablo.

@csipapicsa
Copy link
Author

Hi @palonso,

Thanks for the answer! So are the models on this page the most updated ones?

https://essentia.upf.edu/models.html

If I want to know which one is the best, I assume I need to check the metadata for each classifier to see their accuracy, right? Is there a summary page available for them?

Another question: Can the same embedding model be used for several tasks? For example, if I load "discogs-effnet-bs64-1.pb," do I only need to change the model head, which is usually quite light (around 500kb, kinda)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants