[BUG] Pagerank Online doesn't update correctly when the graph is changed #510

pippellia-btc · 2024-09-12T15:14:19Z

Describe the bug
There are various problems with the implementation of pagerank_online (https://memgraph.com/docs/advanced-algorithms/available-algorithms/pagerank_online) :

(small) the parameter epsilon isn't coherent with the non-online version of pagerank (which uses d, the dampening).
d is supposed to be 1 - epsilon, but the default values don't match. My suggestion is to use either d or epsilon in both algos.
(medium) The CalculatePagerank function (see here ) takes the vector of all the visits, then divides it by a constant and then normalizes it.
This doesn't make any sense, since normalizing right away would be faster and give the same result.
(urgent) The algorithm simply doesn't return the correct values when the graph gets updates, as shown in the next section

To Reproduce
Steps to reproduce the behavior, starting from an empty database:

Create the graph
CREATE (a:Person {name: 'A'}), (b:Person {name: 'B'}), (c:Person {name: 'C'}), (a)-[:FOLLOWS]->(b), (b)-[:FOLLOWS]->(c), (c)-[:FOLLOWS]->(a);
Set pagerank_online
CALL pagerank_online.set(1000, 0.15) YIELD node, rank;
create a trigger so pagerank_online is updated when the graph changes
CREATE TRIGGER pagerank_trigger BEFORE COMMIT EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD node, rank
Update the graph
MATCH (c:Person {name: 'C'}) CREATE (d:Person {name: 'D'}) CREATE (c)-[:FOLLOWS]->(d);
Get new pagerank values:
CALL pagerank_online.get() YIELD node, rank RETURN node, rank;
Compare the values with other implementations of the pagerank algorithm:

import networkx as nx

# Create an empty directed graph
G = nx.DiGraph()

# Add nodes to the graph
nodes = ['A', 'B', 'C', 'D']
G.add_nodes_from(nodes)

# Add edges to the graph
edges = [('A','B'), ('B','C'), ('C','A'), ('C','D')]
G.add_edges_from(edges)

alpha = 0.85
print(nx.pagerank(G, alpha))

or use this simple website tool: http://computerscience.chemeketa.edu/cs160Reader/_static/pageRankApp/index.html

Expected behavior
It should give the correct results

Video
Here is a link to a video where I comment these issues and provide additional context.
It's on your Discord server: https://discord.com/channels/842007348272169002/852201290880385044/1280112013921619999

The text was updated successfully, but these errors were encountered:

katarinasupe · 2024-09-13T08:17:43Z

Hi @pippellia-btc, thank you for opening the issue 🙏

For concerns 1 and 2, I'll let the engineers dive deeper once the issue becomes a priority.

Regarding 3, the fact that the values are not the same might be related to the fact that the online algorithm is only an approximation of PageRank to make it as fast as possible. Still, it carries the same information—the likelihood of a random walk ending in a particular vertex. Does that make sense?

Does this block your further work with Memgraph?

pippellia-btc · 2024-09-13T09:59:49Z

Hey katarina, thanks for the fast reply.

Regarding 3, the fact that the values are not the same might be related to the fact that the online algorithm is only an approximation of PageRank to make it as fast as possible. Still, it carries the same information—the likelihood of a random walk ending in a particular vertex. Does that make sense?

No, that's not the reason. As I show in the video, when one re-set it with
CALL pagerank_online.set(1000, 0.15) YIELD node, rank;
the error is small (<1%).

When I change the graph the error is order of magnitudes higher, like ~50%. In the paper it is shown that the error should always be quite small with that many random walks (1000 in the example), so it must be a bug in the implementation.

Does this block your further work with Memgraph?

Yes, because I'll have to rewrite most of the code myself for the use-case I have in mind.

katarinasupe · 2024-09-13T11:30:23Z

Thank you for the further explanation and a really great video, @pippellia-btc.
Just one small question - what happens if you set epsilon to default 1.0 or 0.85 instead of 0.15? I tried that and it seems that the values are then close to what you expected?

EDIT: Oh sorry, that might be related to me resetting, like you mentioned at the end of the video.

katarinasupe · 2024-09-13T11:37:15Z

I agree with you that it would be better if it's consistent, having just d in both pagerank and pagerank online. Not sure why there was a choice to implement it differently, but probably a different dev worked on each. Still, not sure if it's good to change API and a priority is definitely to have it working correctly. I read on Discord that you tried changing half epsilon and that didn't work. In case you have some other idea on how to fix it, feel free to open a PR since MAGE is an open source project. I would be really happy to see outside contributions 😄

pippellia-btc added the bug bug label Sep 12, 2024

pippellia-btc assigned gitbuda Sep 12, 2024

katarinasupe added Severity - S3 Severity - S3 Frequency - Monthly Frequency - Monthly Reach - VeryFew Reach - VeryFew labels Sep 13, 2024

katarinasupe added Severity - S1 Severity - S1 and removed Severity - S3 Severity - S3 labels Sep 13, 2024

katarinasupe added the community community label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Pagerank Online doesn't update correctly when the graph is changed #510

[BUG] Pagerank Online doesn't update correctly when the graph is changed #510

pippellia-btc commented Sep 12, 2024

katarinasupe commented Sep 13, 2024

pippellia-btc commented Sep 13, 2024

katarinasupe commented Sep 13, 2024 •

edited

Loading

katarinasupe commented Sep 13, 2024

[BUG] Pagerank Online doesn't update correctly when the graph is changed #510

[BUG] Pagerank Online doesn't update correctly when the graph is changed #510

Comments

pippellia-btc commented Sep 12, 2024

katarinasupe commented Sep 13, 2024

pippellia-btc commented Sep 13, 2024

katarinasupe commented Sep 13, 2024 • edited Loading

katarinasupe commented Sep 13, 2024

katarinasupe commented Sep 13, 2024 •

edited

Loading