Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor results #1442

Open
johnbuts opened this issue Oct 15, 2024 · 4 comments
Open

Poor results #1442

johnbuts opened this issue Oct 15, 2024 · 4 comments

Comments

@johnbuts
Copy link

johnbuts commented Oct 15, 2024

Hey everyone, thanks in advanced for the help.

So I wanted to use some of the instrument detection models, and was not impressed by the results. I fed it a wav file that just had saxophone playing for around a minute and 10 seconds. Here is the code and output I got:

`
from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs, TensorflowPredict2D
import pandas as pd

audio = MonoLoader(filename="other_sax.wav", sampleRate=75000, resampleQuality=4)()
embedding_model = TensorflowPredictEffnetDiscogs(graphFilename="discogs-effnet-bs64-1.pb", output="PartitionedCall:1")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="mtg_jamendo_instrument-discogs-effnet-1.pb")
predictions = model(embeddings)

instruments = [
'accordion', 'acousticbassguitar', 'acousticguitar', 'bass', 'beat', 'bell', 'bongo', 'brass',
'cello', 'clarinet', 'classicalguitar', 'computer', 'doublebass', 'drummachine', 'drums',
'electricguitar', 'electricpiano', 'flute', 'guitar', 'harmonica', 'harp', 'horn', 'keyboard',
'oboe', 'orchestra', 'organ', 'pad', 'percussion', 'piano', 'pipeorgan', 'rhodes', 'sampler',
'saxophone', 'strings', 'synthesizer', 'trombone', 'trumpet', 'viola', 'violin', 'voice'
]

df = pd.DataFrame(predictions, columns=instruments)
instrument_sums = df.sum()

top_5_instruments = instrument_sums.sort_values(ascending=False).head(5)

print(top_5_instruments)
`

output:
synthesizer 218.628510
piano 175.836365
drums 113.429985
cello 85.704750
flute 83.436066

Please tell me what I'm doing wrong, thanks.

@palonso
Copy link
Contributor

palonso commented Oct 15, 2024

Hi @johnbuts
The problem with your script is that MonoLoader's sampleRate parameter should match the model's expected sample rate (16000).

@johnbuts
Copy link
Author

drums 53.502373
bass 42.751842
electricguitar 40.419640
piano 34.878933
guitar 32.601994

that didn't seem to help, been toying around with the sample rate, nothing really seems to help it. Is maybe my code wrong? like the order of the instrumens or something?

@johnbuts
Copy link
Author

johnbuts commented Oct 15, 2024

Its like no matter the instrument, its like always really high on sythesizer and piano, like ill put in violin and get this:
synthesizer 70.941162
piano 69.683495
drums 65.955116
electricguitar 63.488628
guitar 62.260086

and then ill put in a saxophone track and get this:
drums 73.532722
synthesizer 72.120262
piano 70.608063
bass 69.452019
electricguitar 66.740585

@johnbuts johnbuts reopened this Oct 15, 2024
@csipapicsa
Copy link

Have your tried to set the resampleQuality=0

resampleQuality (integer ∈ [0, 4], default = 1) :
the resampling quality, 0 for best quality, 4 for fast linear approximation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants