-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy of baseline values #13
Comments
Hi Iago, The values look pretty normal, except for Darmstadt with the Graph Parsing approch. The OpeNER/Multibooked datasets are right where I would expect, Norec is always a bit lower because it's a more diverse dataset and MPQA is quite hard because of the ambiguity of many of the polar expressions and size of the holders and targets. But Darmstadt using the graph parser should be higher than MPQA. |
Thanks for the clarification, the issue could be because of this change? #9 |
I doubt it. These issues aren't so common as to cause a large drop in performance, and the original paper where we take the baseline from (https://aclanthology.org/2021.acl-long.263/) used the same data. Perhaps it's the effect of a particularly poor random seed, as there is a bit of variance (+-2.0 in the paper)?? |
Ah, ok, I'll try to train it again. Thank you. |
Hi,
I am using the latest code available in the repo and have trained both baselines (graph parser and sequence labeling). Then I have measured the Sentiment Tuple F1 with the
dev.json
file and I am getting this values:For Graph Parsing, the values are around 0.5xx except for Darmstadt Unis, MPQA and Norec which are lower, specially Darmstadt Unis.
For Sequence Labeling, the values are around 0.3xx except for Darmstadt Unis, MPQA and Norec which are lower, specially MPQA.
Are those values correct to be taken as a reference or I am doing something wrong training the models or when I do the inference to get the scores?
Regards.
The text was updated successfully, but these errors were encountered: