WIP: Multilabel classification #440

x-tabdeveloping · 2024-04-19T09:59:16Z

Working on #434.
I will still have to add a good test task, if anyone has one don't hesitate to comment.

isaac-chung · 2024-04-19T10:10:18Z

Maybe this? https://huggingface.co/datasets/coastalcph/multi_eurlex

x-tabdeveloping · 2024-04-19T10:32:03Z

I'll look into it, thanks @isaac-chung !!

KennethEnevoldsen

Looks really good @x-tabdeveloping. A few points of discussion, but testing it on a task seems like the best next step.

mteb/abstasks/AbsTaskMultilabelClassification.py

x-tabdeveloping · 2024-04-23T14:04:26Z

I'm currently in the process of adding EURLEX.

mteb/evaluation/evaluators/ClassificationEvaluator.py

…step outside the evaluator and encoding every possible training sentence before running the evaluation.

x-tabdeveloping · 2024-04-24T11:28:32Z

Currently this PR assumes that all labels in the classification are independent from each other.
This is due to MultiOutputClassifier from sklearn, which trains multiple independent classifiers for each label.

Some options we could consider that would fix this:

ClassifierChain which would be an optimal choice for hierarchical tasks or where the ordering of the labels is trivial. We have to be cautious to order the labels properly though, which might be a pain in a half to do, and I'm not sure whether this should be the specific tasks' or the AbsTask's responsibility.
Using a neural network like MLPClassifier with multiple outputs. This would be a good option, because it does not need any ordering and does not assume the independence of labels, but it's waaaaaay slower than just using kNN and we would also lose a great deal of conceptual transparency.

What do you guys think @KennethEnevoldsen @imenelydiaker @isaac-chung ?

x-tabdeveloping · 2024-04-24T11:50:15Z

I'm currently in the process of running MultiEURLEX on my machine, this might take a fair bit :D

KennethEnevoldsen · 2024-04-24T11:52:39Z

My immediate assumption is just to go for simplicity and then we can always expand to other cases in the future.

x-tabdeveloping · 2024-04-24T12:20:09Z

Regarding the points: Do we count MultilabelClassification as a new task for each language contained in EURLEX or should I only add bonus points for those languages that had no classification task prior to this?

x-tabdeveloping · 2024-04-25T13:02:24Z

I have been running the task basically all day on UCloud on the two models, it takes a ridiculous amount of time.

x-tabdeveloping · 2024-05-03T12:47:44Z

Running on UCloud again, should be able to submit results within a day.

KennethEnevoldsen

@x-tabdeveloping feel free to merge it in once it is done running!

x-tabdeveloping · 2024-05-08T12:53:22Z

It runs very slow, I couldn't complete the runs, maybe we should subsample and limit to the test set instead of the validation set and the test set.
Performance is still crap, I have no idea what to do about that, and am at a bit of a loss as to what is happening.

…work is smaller.

…label for training

x-tabdeveloping · 2024-05-08T13:16:54Z

I made the neural network smaller and introduced stratified subsampling for the test set so that it runs faster, I will try to do a rerun.

This reverts commit 96312c7.

isaac-chung · 2024-05-08T14:58:30Z

For what it's worth, maybe it might help to debug to use a small dataset.

KennethEnevoldsen · 2024-05-08T19:18:14Z

Yea using a smaller dataset for test seems like the right approach.

It runs very slow, I couldn't complete the runs, maybe we should subsample and limit to the test set instead of the validation set and the test set.

Hmm any idea about what part is slow? Is it simply running the trained model on the test set? (in which case reducing the test set might be an option)

Performance is still crap, I have no idea what to do about that, and am at a bit of a loss as to what is happening.

Doing a baseline using a logistic regression on each label is probably a good idea

x-tabdeveloping · 2024-05-09T09:25:37Z

Something's not right with these scores, I will make a deep dive

x-tabdeveloping · 2024-05-09T13:24:56Z

I ran EURLEX in English with all-MiniLM-L6 with multiple classifiers (MLPClassifier, KNN, DummyClassifier).
It would seem that the task is simply incredibly hard, and that accuracy is not exactly a good metric to reflect performance, maybe we should make lrap the main score.
Also note that kNN outperforms MLP by quite a bit, I think this is mainly because of the very small training set and the model is quite parameter-rich.

My suggestion is that we roll back to kNN and make LRAP the main score, what do you think @KennethEnevoldsen ?

{
    "en": {
      "dummy": {
        "accuracy": 0.0,
        "f1": 0.0,
        "lrap": 0.17113333333332317
      },
      "knn": {
        "accuracy": 0.0396,
        "f1": 0.29540945816583636,
        "lrap": 0.4267690714285629
      },
      "mlp": {
        "accuracy": 0.0082,
        "f1": 0.08189335124049107,
        "lrap": 0.2942032142856986
      }
    },
    "evaluation_time": 270.71
  }

x-tabdeveloping · 2024-05-09T13:30:43Z

Also including Dummy classifier scores gives us a relatively good idea of chance level in this multilabel case.

…made switching out classifiers more flexible

…ings-benchmark/mteb into multilabel-classification

KennethEnevoldsen · 2024-05-09T17:03:41Z

I would not include it in the task, but it might be interesting to just have a "random" model as a baseline.

A couple of thoughts. It might be worth increasing the training set size for the MLP
- It might be fine with just KNN, alternatively we can do KNN + MLP and take the best (similar to clf)
It might be worth getting performance scores for subcategories (though in this case, it is 100+ right?)
I would also like an experiment using the base e5 just to see that larger models actually perform better

x-tabdeveloping · 2024-05-09T18:40:42Z

E5 definitely performs better on the task than paraphrase-multilingual. I'm not sure about the subcategories, might be a bit too much for some tasks. Though we could include it if need be.
In my experiments kNN uniformly performs better even with larger training set sizes. I suppose if it grows even larger it would surpass kNN, but we're already fighting performance issues with the benchmark, I think the less we have to embed the better.

x-tabdeveloping · 2024-05-10T10:51:39Z

Also specific tasks are free to use whatever they want, like if you see an MLP more fit you can specify it in the task.
What are your thoughts on the PR right now @KennethEnevoldsen ? Should we merge or is there something that still should be addressed

KennethEnevoldsen · 2024-05-11T10:11:38Z

I believe it is fine to merge

x-tabdeveloping added 4 commits April 19, 2024 10:11

Added Multilabel kNN classification evaluator

d42ef56

Added Multilabel classification AbsTask

2da53e2

Added MultiLabelClassification Task type to TaskMetadata

e22c9e6

bugfix

6bdd3f5

x-tabdeveloping added the enhancement New feature or request label Apr 19, 2024

KennethEnevoldsen reviewed Apr 19, 2024

View reviewed changes

KennethEnevoldsen self-assigned this Apr 19, 2024

KennethEnevoldsen added the WIP Work In Progress label Apr 19, 2024

imenelydiaker mentioned this pull request Apr 20, 2024

Multiple label annotated datasets #455

Closed

KennethEnevoldsen mentioned this pull request Apr 23, 2024

fix: add Multilingual Hate Speech detection task #439

Merged

9 tasks

Removed all references to metadata_dict from Multilabel classification

a31ed01

Added Eurlex (wip)

3b30b50

KennethEnevoldsen reviewed Apr 23, 2024

View reviewed changes

mteb/evaluation/evaluators/ClassificationEvaluator.py Outdated Show resolved Hide resolved

mteb/evaluation/evaluators/ClassificationEvaluator.py Outdated Show resolved Hide resolved

isaac-chung mentioned this pull request Apr 24, 2024

MalteseNewsClassification added #546

Merged

10 tasks

Made MultiLabelClassification more efficient by moving the embedding …

4ade123

…step outside the evaluator and encoding every possible training sentence before running the evaluation.

x-tabdeveloping added 2 commits April 24, 2024 13:47

fix: changed itertools.chain to itertools.chain.from_iter

c6217db

fix: Fixed validation and import on MultiEURLEX

e2a0b1d

x-tabdeveloping requested a review from KennethEnevoldsen April 24, 2024 12:08

Merge branch 'main' into multilabel-classification

b342b3d

x-tabdeveloping added 3 commits April 25, 2024 09:56

Removed MultioutputClassifier, because kNN can already do that

ad955d1

fix: multilabels are not turned into an array

b5e99c8

Ran linting

9381852

KennethEnevoldsen approved these changes May 6, 2024

View reviewed changes

x-tabdeveloping added 4 commits May 8, 2024 14:55

Limited evaluation to test split in EURLEX

3248b70

multilabel classification now subsamples test set, and the neural net…

5cd605b

…work is smaller.

Multilabel classification now allows tasks to define the samples per …

45bfa29

…label for training

Removed unused code

33f3f27

x-tabdeveloping added 4 commits May 8, 2024 15:32

Moved subsampling to before encoding

0117b59

Made subsampling error tolerant

49afeb1

Made sure all labels are represented in the training set

96312c7

Revert "Made sure all labels are represented in the training set"

f55cbeb

This reverts commit 96312c7.

Reran EURLEX

87ad125

EURLEX only evaluates on test set, not validation set

6d2d1b0

x-tabdeveloping added 2 commits May 9, 2024 15:42

Made KNeighbours the default classifier in MultiLabelClassification, …

51ccdc4

…made switching out classifiers more flexible

Merge branch 'multilabel-classification' of https://github.com/embedd…

1dad403

…ings-benchmark/mteb into multilabel-classification

Added results for EURLEX

55374a9

Merge branch 'main' into multilabel-classification

e617020

x-tabdeveloping enabled auto-merge (squash) May 11, 2024 12:06

x-tabdeveloping merged commit 2aa0c67 into main May 11, 2024
7 checks passed

x-tabdeveloping deleted the multilabel-classification branch May 11, 2024 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Multilabel classification #440

WIP: Multilabel classification #440

x-tabdeveloping commented Apr 19, 2024

isaac-chung commented Apr 19, 2024

x-tabdeveloping commented Apr 19, 2024

KennethEnevoldsen left a comment

x-tabdeveloping commented Apr 23, 2024

x-tabdeveloping commented Apr 24, 2024

x-tabdeveloping commented Apr 24, 2024

KennethEnevoldsen commented Apr 24, 2024 •

edited

Loading

x-tabdeveloping commented Apr 24, 2024

x-tabdeveloping commented Apr 25, 2024

x-tabdeveloping commented May 3, 2024

KennethEnevoldsen left a comment

x-tabdeveloping commented May 8, 2024

x-tabdeveloping commented May 8, 2024

isaac-chung commented May 8, 2024

KennethEnevoldsen commented May 8, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 9, 2024

KennethEnevoldsen commented May 9, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 10, 2024

KennethEnevoldsen commented May 11, 2024

WIP: Multilabel classification #440

WIP: Multilabel classification #440

Conversation

x-tabdeveloping commented Apr 19, 2024

isaac-chung commented Apr 19, 2024

x-tabdeveloping commented Apr 19, 2024

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

x-tabdeveloping commented Apr 23, 2024

x-tabdeveloping commented Apr 24, 2024

x-tabdeveloping commented Apr 24, 2024

KennethEnevoldsen commented Apr 24, 2024 • edited Loading

x-tabdeveloping commented Apr 24, 2024

x-tabdeveloping commented Apr 25, 2024

x-tabdeveloping commented May 3, 2024

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

x-tabdeveloping commented May 8, 2024

x-tabdeveloping commented May 8, 2024

isaac-chung commented May 8, 2024

KennethEnevoldsen commented May 8, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 9, 2024

KennethEnevoldsen commented May 9, 2024

x-tabdeveloping commented May 9, 2024

x-tabdeveloping commented May 10, 2024

KennethEnevoldsen commented May 11, 2024

KennethEnevoldsen commented Apr 24, 2024 •

edited

Loading