-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Pagerank Online doesn't update correctly when the graph is changed #510
Comments
Hi @pippellia-btc, thank you for opening the issue 🙏 For concerns 1 and 2, I'll let the engineers dive deeper once the issue becomes a priority. Regarding 3, the fact that the values are not the same might be related to the fact that the online algorithm is only an approximation of PageRank to make it as fast as possible. Still, it carries the same information—the likelihood of a random walk ending in a particular vertex. Does that make sense? Does this block your further work with Memgraph? |
Hey katarina, thanks for the fast reply.
No, that's not the reason. As I show in the video, when one re-set it with When I change the graph the error is order of magnitudes higher, like ~50%. In the paper it is shown that the error should always be quite small with that many random walks (1000 in the example), so it must be a bug in the implementation.
Yes, because I'll have to rewrite most of the code myself for the use-case I have in mind. |
Thank you for the further explanation and a really great video, @pippellia-btc. EDIT: Oh sorry, that might be related to me resetting, like you mentioned at the end of the video. |
I agree with you that it would be better if it's consistent, having just |
Describe the bug
There are various problems with the implementation of
pagerank_online
(https://memgraph.com/docs/advanced-algorithms/available-algorithms/pagerank_online) :(small) the parameter
epsilon
isn't coherent with the non-online version of pagerank (which usesd
, the dampening).d
is supposed to be1 - epsilon
, but the default values don't match. My suggestion is to use eitherd
orepsilon
in both algos.(medium) The CalculatePagerank function (see here ) takes the vector of all the visits, then divides it by a constant and then normalizes it.
This doesn't make any sense, since normalizing right away would be faster and give the same result.
(urgent) The algorithm simply doesn't return the correct values when the graph gets updates, as shown in the next section
To Reproduce
Steps to reproduce the behavior, starting from an empty database:
Create the graph
CREATE (a:Person {name: 'A'}), (b:Person {name: 'B'}), (c:Person {name: 'C'}), (a)-[:FOLLOWS]->(b), (b)-[:FOLLOWS]->(c), (c)-[:FOLLOWS]->(a);
Set pagerank_online
CALL pagerank_online.set(1000, 0.15) YIELD node, rank;
create a trigger so pagerank_online is updated when the graph changes
CREATE TRIGGER pagerank_trigger BEFORE COMMIT EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD node, rank
Update the graph
MATCH (c:Person {name: 'C'}) CREATE (d:Person {name: 'D'}) CREATE (c)-[:FOLLOWS]->(d);
Get new pagerank values:
CALL pagerank_online.get() YIELD node, rank RETURN node, rank;
Compare the values with other implementations of the pagerank algorithm:
or use this simple website tool: http://computerscience.chemeketa.edu/cs160Reader/_static/pageRankApp/index.html
Expected behavior
It should give the correct results
Video
Here is a link to a video where I comment these issues and provide additional context.
It's on your Discord server: https://discord.com/channels/842007348272169002/852201290880385044/1280112013921619999
The text was updated successfully, but these errors were encountered: