Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bert mesh experiments #160

Merged
merged 32 commits into from
Feb 25, 2022
Merged

Add Bert mesh experiments #160

merged 32 commits into from
Feb 25, 2022

Conversation

nsorros
Copy link
Contributor

@nsorros nsorros commented Nov 24, 2021

Description

Fixes #127

Checklist

  • Linked to Notion or GitHub issue
  • Added tests
  • Updated README
  • DVC up to date

@nsorros nsorros changed the title Bert mesh Add Bert mesh experiments Nov 24, 2021

tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
X_vec = tokenizer(X, truncation=True, padding=True)["input_ids"]
X_vec = np.array(X_vec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why transform it to a np.array here?
you can do same .shape calls and save it also as a tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess so that we can use it both for tensorflow and pytorch training

epochs: int=5, pretrained_model="bert-base-uncased"):
dataset = MeshDataset(x_path, y_path)

model = AutoModelForSequenceClassification.from_pretrained(pretrained_model, num_labels=dataset.num_labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you check if this trains ok on GPU?
(you might have to do .to(device) for the model and the inputs)

@nsorros nsorros changed the base branch from master to main December 7, 2021 09:54
@nsorros nsorros mentioned this pull request Dec 8, 2021
4 tasks
@nsorros
Copy link
Contributor Author

nsorros commented Dec 12, 2021

As results are not submitted so far. The first Pubmed Bert ran gave 0.40 and with multi label attention 0.49.

@nsorros
Copy link
Contributor Author

nsorros commented Dec 12, 2021

Examples from weights and biases tracking GPU utilization out of the box

313E02B9-230D-4BA8-ADE4-1483938DAA99

super().__init__()
self.pretrained_model = pretrained_model
self.num_labels = num_labels
self.multilabel_attention = multilabel_attention
Copy link
Contributor

@pdan93 pdan93 Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this line needed if you overwrite it in 2 lines below? or maybe an if statement was intended there but forgotten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛

self.multilabel_attention = MultiLabelAttention(
768, num_labels
) # num_labels, 768
self.linear_1 = torch.nn.Linear(768, 512) # num_labels, 512
Copy link
Contributor

@pdan93 pdan93 Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using the sequential api for these 3 layers?
nn.Sequential(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually in this case we use different layers depending on whether we add a multilabel attention or not. not all are used so this reduces duplication but generally good shout

else:
cls = self.bert(input_ids=inputs)[1]
outs = torch.nn.functional.relu(self.linear_1(cls))
outs = torch.nn.functional.relu(self.linear_2(outs))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be remove as it is only relevant for multilabel attention

@nsorros
Copy link
Contributor Author

nsorros commented Jan 10, 2022

Opening this PR for review. Note that flake8 error has surfaced a bug 🐛 to how I evaluate snapshot ensemble.

The best result is achieved in commit 4c2f68e

@aCampello
Copy link
Contributor

Results look good, would be good to include time to train.

@aCampello
Copy link
Contributor

What's your intention with this PR - merge as is? Or will you work more on that still?

@nsorros
Copy link
Contributor Author

nsorros commented Jan 17, 2022

What's your intention with this PR - merge as is? Or will you work more on that still?

This should be merged as is, it represents all the experiments I have run so far. It is experimental and not production ready i.e. there is no class abstraction that represents BertMesh and the code duplicates some things but it serves the purpose of moving fast towards trying a couple of ideas from the literature. It should also not be that far from being production ready anyway.

Copy link
Contributor

@aCampello aCampello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had read this PR before, don't know why I didn't approve.

How long did it take to train? I am keen to try and reproduce this with me AWS account.

@@ -0,0 +1 @@
{"p": 0.6955707011887685, "r": 0.569327457868485, "f1": 0.626149221959104, "th": 0.5}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have train information such as time to train if I want to reproduce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it can be added, I was not using our main train script to accelerate experimentation but I can add it easily.

@nsorros
Copy link
Contributor Author

nsorros commented Feb 25, 2022

I had read this PR before, don't know why I didn't approve.

How long did it take to train? I am keen to try and reproduce this with me AWS account.

Well actually there is a good opportunity to do this as soon as I change the model to produce a hugging face model which will make uploading to a hub super easy. So let me open this PR today and you can run it. It takes ~4 days in g4dn.metal. You can track progress with wandb

@nsorros nsorros merged commit 087c7da into main Feb 25, 2022
@nsorros nsorros deleted the bert-mesh branch February 25, 2022 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Train default BertMesh
3 participants