Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuned bert LM #2

Open
zparcheta opened this issue Jun 8, 2020 · 2 comments
Open

Fine tuned bert LM #2

zparcheta opened this issue Jun 8, 2020 · 2 comments

Comments

@zparcheta
Copy link

zparcheta commented Jun 8, 2020

Hi,
I use pytorch_pretrained_BERT/examples/python run_lm_finetuning.py to fit the model with monolingual set of sentences. I use bert multilingual cased model.

Once the model is fine-tuned, I get the loss for given sentences with the following code:

def get_score(sentence, model):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    model.eval()
    predictions=model(tensor_input)
    loss_fct = torch.nn.CrossEntropyLoss()
    loss = loss_fct(predictions.squeeze(),tensor_input.squeeze()).data 
    return math.exp(loss)
sentence = "ﺶﻋﺮﺴﺗﺎﻧ؛ ﺩ پښﺕﻭ ﺶﻋﺭپﻮﻬﻧې ﻥﻭی پړﺍﻭ - ﺕﺎﻧﺩ"
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
stats=torch.load('pytorch_model.bin')
bertMaskedLM = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased', state_dict=stats)

print(get_score(sentence, bertMaskedLM))

78637.05198167797

bertMaskedLM_orig = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
print(get_score(sentence, bertMaskedLM_orig))

7.919475431571431

The strange thing is that the fine-tuned model returns much higher loss scores, even if the evaluated sentence appeared in monolingual training data.

Is something I am doing wrong? I just want to check how well the given sentence fits into LM.

Regards and thanks in advance

@pangbochen
Copy link

I suggest to use https://github.com/huggingface/transformers

this repo is the copy of huggingface's project

pytorch_pretrained_BERT in huggingface change to transformers

@pangbochen
Copy link

see the original code in
https://github.com/huggingface/transformers/tree/0.5.0

best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants