Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class CorefTaggerReview means what? #4

Open
smallsmallwood opened this issue Dec 10, 2018 · 3 comments
Open

Class CorefTaggerReview means what? #4

smallsmallwood opened this issue Dec 10, 2018 · 3 comments

Comments

@smallsmallwood
Copy link

I find the output is 12 vector in Class CorefTagger,and the final output y is a 13 vector in your paper. Are there any differences?

another question: did you test on ".auto_conll" in you paper (CONLLING 2018)

@ylmeng
Copy link
Collaborator

ylmeng commented Dec 10, 2018

Sorry for the confusion.
For a triad (a, b, c), we only use output for (a, c) and (b, c) in the current version. So a triad can have three pairwise outputs, but we use two of them for final predictions. It is more efficient and often times more accurate. However, inside the neural network, there is no change from the original version. All three pairs still go through the layers. If you use three pairs, the scores should be very similar, maybe a little bit lower.

Test set does not have gold_conll so we used auto_conll only. We had some bug in the evaluation program for our COLING paper so the scores are not as good as current one. Specifically, separate parts of an article do not have coreference in between, but we had assumed coreference could occur across the parts, and made the task more difficult.
After we fixed the bug the scores get better, as you can see in the Arxiv paper. Please refer to Arxiv paper, which corrected some errors. (We tried to update the COLING paper too but the process is longer).

@smallsmallwood
Copy link
Author

Thanks for your patience.
I also don't understand the role of the operation torch.max(), as follows:(I did‘n find some analyses in your new paper)

word_repr_0, _ = self.Attention(word_lstm_0, torch.cat([word_lstm_1, word_lstm_2], 1))
word_repr_0, _ = torch.max(word_repr_0, dim=1, keepdim=False) # (batch, feature)

word_repr_1, _ = self.Attention(word_lstm_1, torch.cat([word_lstm_0, word_lstm_2], 1))
word_repr_1, _ = torch.max(word_repr_1, dim=1, keepdim=False) # (batch, feature)

word_repr_2, _ = self.Attention(word_lstm_2, torch.cat([word_lstm_0, word_lstm_1], 1))
word_repr_2, _ = torch.max(word_repr_2, dim=1, keepdim=False) # (batch, feature)

gold-60
It is lower than your results tested with gold mentions in your new paper.

@ylmeng
Copy link
Collaborator

ylmeng commented Jan 24, 2019

Sorry for the delay. torch.max() just does the max-pooling, which is widely used for RNN-based models.
So instead of using the output of the last time step, or the average over time steps, we use the max value over time steps to represent the sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants