-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Multilabel classification #440
Conversation
I'll look into it, thanks @isaac-chung !! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good @x-tabdeveloping. A few points of discussion, but testing it on a task seems like the best next step.
I'm currently in the process of adding EURLEX. |
…step outside the evaluator and encoding every possible training sentence before running the evaluation.
Currently this PR assumes that all labels in the classification are independent from each other. Some options we could consider that would fix this:
What do you guys think @KennethEnevoldsen @imenelydiaker @isaac-chung ? |
I'm currently in the process of running MultiEURLEX on my machine, this might take a fair bit :D |
My immediate assumption is just to go for simplicity and then we can always expand to other cases in the future. |
Regarding the points: Do we count |
I have been running the task basically all day on UCloud on the two models, it takes a ridiculous amount of time. |
Running on UCloud again, should be able to submit results within a day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@x-tabdeveloping feel free to merge it in once it is done running!
|
I made the neural network smaller and introduced stratified subsampling for the test set so that it runs faster, I will try to do a rerun. |
For what it's worth, maybe it might help to debug to use a small dataset. |
Yea using a smaller dataset for test seems like the right approach.
Hmm any idea about what part is slow? Is it simply running the trained model on the test set? (in which case reducing the test set might be an option)
Doing a baseline using a logistic regression on each label is probably a good idea |
Something's not right with these scores, I will make a deep dive |
I ran EURLEX in English with all-MiniLM-L6 with multiple classifiers ( My suggestion is that we roll back to kNN and make LRAP the main score, what do you think @KennethEnevoldsen ?
|
Also including Dummy classifier scores gives us a relatively good idea of chance level in this multilabel case. |
…made switching out classifiers more flexible
…ings-benchmark/mteb into multilabel-classification
I would not include it in the task, but it might be interesting to just have a "random" model as a baseline.
|
E5 definitely performs better on the task than paraphrase-multilingual. I'm not sure about the subcategories, might be a bit too much for some tasks. Though we could include it if need be. |
Also specific tasks are free to use whatever they want, like if you see an MLP more fit you can specify it in the task. |
I believe it is fine to merge |
Working on #434.
I will still have to add a good test task, if anyone has one don't hesitate to comment.