-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset with the entity label #15
Comments
Hi, Sorry for my late reply.
Note that, as the multilingual versions of DBpedia are extracted from the same resource Wikipedia, most of the aligned entities have the same name. In this case, using names to align these entities may achieve high accuracy. But in the real entity alignment scenario, such as aligning English KGs to a low-resource one, or the case where entity names are not available, the methods using entity names may not work well. So, we do not recommend using entity names. More robust features and methods for entity alignment are worth exploring. |
Hi, |
Hi @MrYxJ , Apologies for the late reply again. Indeed you are right. To elaborate a bit more, we would also point out that there could be issues of fair comparison and test data leakage in cross-lingual EA in some prior studies where entity names are incorporated. This is not essentially due to embedding entity names, but due to some additional cross-lingual supervision labels/signals. E.g., in the original RDGCN and GCN-JE papers, the authors used Google Translate to translate surface forms of entities in all other languages to English, then initialize the entity embeddings in their model with pre-trained word embedding of translated entity names. This is problematic in two ways:
For the above point 2, it is unfortunate to see that a few other more recent works are (what we believe, errorneously) following such an unfair evaluation protocol, for which we definitely suggest against. In fact, a few other studies have already realized this issue and have set good examples to separate w/ and w/o MT into two evaluation settings (e.g. the HMAN and MRAEA papers). And some works have also explicitly pointed out this issue (e.g., AttrGCN, JEANS and EVA papers). We will also continue to make further clarification of this fair comparison issue in future publications and release of OpenEA versions. Note: The above issue only applied to the cases of cross-lingual EA. For monolingual EAs where training monolingual embeddings or directly comparing entity names are without any need of cross-lingual training labels, using entity names do not violate fair comparison. Although it is definitely worthy to examine how well a system could perform without the presence of entity names and with only the structural information. since in lots of KBs (especially bio-med ones), there might not be meaningful entity names. -Muhao |
Hello author, in the paper, the following part is mentioned.
"Considering that DBpedia,
Wikidata and YAGO collect data from very similar sources
(mainly, Wikipedia), the aligned entities usually have identical labels. They would become “tricky” features for entity
alignment and influence the evaluation of real performance.
According to the suggestion in [95], we delete entity label"
May I know if this label refers to the type of that entity?( for example, the type of Michael_Jordan is Person)
Do you still have the dataset with all the labels? I would like to see whether this label could help to embed in some interesting way. If not, I might have to do some crawling to DBpedia and wikidata.
Thanks!
The text was updated successfully, but these errors were encountered: