-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Bert mesh experiments #160
Conversation
|
||
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") | ||
X_vec = tokenizer(X, truncation=True, padding=True)["input_ids"] | ||
X_vec = np.array(X_vec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why transform it to a np.array here?
you can do same .shape
calls and save it also as a tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess so that we can use it both for tensorflow and pytorch training
epochs: int=5, pretrained_model="bert-base-uncased"): | ||
dataset = MeshDataset(x_path, y_path) | ||
|
||
model = AutoModelForSequenceClassification.from_pretrained(pretrained_model, num_labels=dataset.num_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you check if this trains ok on GPU?
(you might have to do .to(device)
for the model and the inputs)
As results are not submitted so far. The first Pubmed Bert ran gave 0.40 and with multi label attention 0.49. |
super().__init__() | ||
self.pretrained_model = pretrained_model | ||
self.num_labels = num_labels | ||
self.multilabel_attention = multilabel_attention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this line needed if you overwrite it in 2 lines below? or maybe an if statement was intended there but forgotten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛
grants_tagger/bertmesh/model.py
Outdated
self.multilabel_attention = MultiLabelAttention( | ||
768, num_labels | ||
) # num_labels, 768 | ||
self.linear_1 = torch.nn.Linear(768, 512) # num_labels, 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using the sequential api for these 3 layers?
nn.Sequential(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually in this case we use different layers depending on whether we add a multilabel attention or not. not all are used so this reduces duplication but generally good shout
grants_tagger/bertmesh/model.py
Outdated
else: | ||
cls = self.bert(input_ids=inputs)[1] | ||
outs = torch.nn.functional.relu(self.linear_1(cls)) | ||
outs = torch.nn.functional.relu(self.linear_2(outs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be remove as it is only relevant for multilabel attention
Opening this PR for review. Note that flake8 error has surfaced a bug 🐛 to how I evaluate snapshot ensemble. The best result is achieved in commit 4c2f68e |
Results look good, would be good to include time to train. |
What's your intention with this PR - merge as is? Or will you work more on that still? |
This should be merged as is, it represents all the experiments I have run so far. It is experimental and not production ready i.e. there is no class abstraction that represents BertMesh and the code duplicates some things but it serves the purpose of moving fast towards trying a couple of ideas from the literature. It should also not be that far from being production ready anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had read this PR before, don't know why I didn't approve.
How long did it take to train? I am keen to try and reproduce this with me AWS account.
@@ -0,0 +1 @@ | |||
{"p": 0.6955707011887685, "r": 0.569327457868485, "f1": 0.626149221959104, "th": 0.5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have train information such as time to train if I want to reproduce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure it can be added, I was not using our main train script to accelerate experimentation but I can add it easily.
Well actually there is a good opportunity to do this as soon as I change the model to produce a hugging face model which will make uploading to a hub super easy. So let me open this PR today and you can run it. It takes ~4 days in |
Description
Fixes #127
Checklist