Fix rank computation in the RGCN link prediction example #4688
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a common problem in the ranking protocol of KG link prediction models.
Right now, the script puts the true prediction at the very start of the entities lists to rank:
pytorch_geometric/examples/rgcn_link_pred.py
Line 132 in 9761ccf
Then, the script is doing
argsort
over model scores:pytorch_geometric/examples/rgcn_link_pred.py
Lines 138 to 139 in 9761ccf
Here is the problem:
When a model returns exactly the same scores for the true and other entities in the list, the ranking becomes incorrect - that is, overly optimistic. This behavior was identified in the Sun et al ACL 2020 paper
To fix this problem, the community (eg, in PyKEEN ) resorts to "realistic" metric which is an average of the optimistic and pessimistic ranking:
The effect is easy to check feeding the vector of all zeros imitating the effect when model predicts exactly the same score for the true entity at position 0 and all other entities:
This PR changes the ranking function in the example script to the realistic ranking