Fix rank computation in the RGCN link prediction example #4688

migalkin · 2022-05-20T20:10:28Z

This PR fixes a common problem in the ranking protocol of KG link prediction models.

Right now, the script puts the true prediction at the very start of the entities lists to rank:

pytorch_geometric/examples/rgcn_link_pred.py

Line 132 in 9761ccf

tail = torch.cat([torch.tensor([dst]), tail])

Then, the script is doing argsort over model scores:

pytorch_geometric/examples/rgcn_link_pred.py

Lines 138 to 139 in 9761ccf

    
           perm = out.argsort(descending=True) 
        
           rank = int((perm == 0).nonzero(as_tuple=False).view(-1)[0])

Here is the problem:
When a model returns exactly the same scores for the true and other entities in the list, the ranking becomes incorrect - that is, overly optimistic. This behavior was identified in the Sun et al ACL 2020 paper

To fix this problem, the community (eg, in PyKEEN ) resorts to "realistic" metric which is an average of the optimistic and pessimistic ranking:

def compute_rank(ranks):
    # fair ranking prediction as the average of optimistic and pessimistic ranking
    true = ranks[0]
    optimistic = (ranks > true).sum() + 1
    pessimistic = (ranks >= true).sum()
    return (optimistic + pessimistic).float() * 0.5

The effect is easy to check feeding the vector of all zeros imitating the effect when model predicts exactly the same score for the true entity at position 0 and all other entities:

def old_rank(ranks):
    perm = ranks.argsort(descending=True)
    rank =  int((perm==0).nonzero(as_tuple=False).view(-1)[0])
    return rank + 1

ranks = torch.zeros(10,)

print(old_rank(ranks))      # 1 - incorrect, overly optimistic
print(compute_rank(ranks))  # 5.5 - correct, realistic

This PR changes the ranking function in the example script to the realistic ranking

for more information, see https://pre-commit.ci

codecov · 2022-05-20T20:13:31Z

Codecov Report

Merging #4688 (a2a299b) into master (c4977ea) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4688   +/-   ##
=======================================
  Coverage   82.88%   82.88%           
=======================================
  Files         318      318           
  Lines       16820    16820           
=======================================
  Hits        13942    13942           
  Misses       2878     2878

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4977ea...a2a299b. Read the comment docs.

…eometric into rgcn_eval_fix

rusty1s

Great. Thanks a lot!

migalkin and others added 2 commits May 20, 2022 15:55

compute ranks fix

ad82018

[pre-commit.ci] auto fixes from pre-commit.com hooks

b529395

for more information, see https://pre-commit.ci

migalkin added 2 commits May 20, 2022 16:25

pleasing PEP8

c9825d3

Merge branch 'rgcn_eval_fix' of https://github.com/migalkin/pytorch_g…

a2a299b

…eometric into rgcn_eval_fix

rusty1s assigned migalkin May 20, 2022

rusty1s added bug 0 - Priority P0 example labels May 20, 2022

rusty1s approved these changes May 20, 2022

View reviewed changes

rusty1s merged commit c7ac550 into pyg-team:master May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix rank computation in the RGCN link prediction example #4688

Fix rank computation in the RGCN link prediction example #4688

migalkin commented May 20, 2022

codecov bot commented May 20, 2022 •

edited

Loading

rusty1s left a comment

	perm = out.argsort(descending=True)
	rank = int((perm == 0).nonzero(as_tuple=False).view(-1)[0])

Fix rank computation in the RGCN link prediction example #4688

Fix rank computation in the RGCN link prediction example #4688

Conversation

migalkin commented May 20, 2022

codecov bot commented May 20, 2022 • edited Loading

Codecov Report

rusty1s left a comment

Choose a reason for hiding this comment

codecov bot commented May 20, 2022 •

edited

Loading