Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tested on test_split and got very low accurarcy using pretrained weights #4

Open
yen52205 opened this issue Oct 12, 2021 · 10 comments
Open

Comments

@yen52205
Copy link

I tested inference_batch.py on dataset/split/test.csv and got 0.57,0.744,0.744 on AV/A/V separately.
The models I used were download from https://drive.google.com/u/0/uc?id=1L_NOVKCElwcYUEAKp1-FZj_G6Hcq2g2c&export=download (which were provided in README.md).

@yen52205 yen52205 changed the title tested on test_split and got very low accurarcy tested on test_split and got very low accurarcy using pretrained weights Oct 19, 2021
@seungheondoh
Copy link
Owner

Oh, I will check again, what type of classifier?? (audio, remi, magenta)

@yen52205
Copy link
Author

yen52205 commented Oct 29, 2021

Thanks.
it's magenta type.
The results I mentioned (0.57/0.744/0.744) were computed by 'correctly classified numbers/ test clip numbers'.
Is this the same way that you compute accuracy?

@seungheondoh
Copy link
Owner

seungheondoh commented Oct 29, 2021

image

I re-check performance, but there is no performance decrease. I think that it is difference about global seed!
plz check & run https://github.com/SeungHeonDoh/EMOPIA_cls/blob/main/midi_cls/train_test.py with best hparams.yaml

or just simply add global seed in your script!

from pytorch_lightning import seed_everything

if args.reproduce:
    seed_everything(42)

@yen52205
Copy link
Author

image

I re-check performance, but there is no performance decrease. I think that it is difference about global seed! plz check & run https://github.com/SeungHeonDoh/EMOPIA_cls/blob/main/midi_cls/train_test.py with best hparams.yaml

or just simply add global seed in your script!

from pytorch_lightning import seed_everything

if args.reproduce:
    seed_everything(42)

thanks!

I didn't set global seed.
will the global seed setting influence the inference result? or it only influence training reimplementation?

I added global seed to both inference_batch.py and inference.py, but still got weird results.
I just used the inference_batch.py with the best weight (readme.md) to inference on all the .mid clips, and used the csv produced by inference_batch.py itself to map 'dataset/split/test.csv', and computed how many clips were correctly classified.
But still got the results 0.57,0.744,0.744 on AV/A/V separately.

Here is the dataset/split/test.csv and the csv produced by inference_batch.py.
Could you please give me a check if there is anything I didn't notice?
1029_seed_arousal_all.csv
1029_seed_arva_all.csv
1029_seed_valence_all.csv
test.csv

@seungheondoh
Copy link
Owner

It's very weird. Could you follow Training from scratch step? Not use inference_batch.py

preprocessing.py
train_test.py

@yen52205
Copy link
Author

I used inference_batch.py because I wanted to test the best weights you provided on EMOPIA dataset.
Could I use train_test.py to do the same thing ? (only testing no training)

@seungheondoh
Copy link
Owner

seungheondoh commented Oct 30, 2021

I wanna just double checking the result. It is strange that the results are different even when there are no other factors. I will check my inference code also!

@seungheondoh
Copy link
Owner

seungheondoh commented Oct 30, 2021

tain_test1030.csv
inference1030.csv

with best_weight, I found that the train_test.py and inference.py results were different. I think batch inference and zero padding seems to have affected the performance. There are only 87 test samples, small differences affected the big results.

There is no problem with best weight. I will modify the inference code to train_test style soon.

@yen52205
Copy link
Author

yen52205 commented Oct 30, 2021

with best_weight, I found that the train_test.py and inference.py results were different. I think batch inference and zero padding seems to have affected the performance. There are only 87 test samples, small differences affected the big results.

There is no problem with best weight. I will modify the inference code to train_test method soon.

thanks a lot!!
could you probably further explain the difference between two results after you modify this?

@yen52205
Copy link
Author

Hi, sorry to disturb.
Did you find the problem that caused they different?
Was zero padding interfered the result in inference_batch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants