Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(algorithm): support biased second order random walk #280

Merged
merged 17 commits into from
Dec 4, 2023

Conversation

diaohancai
Copy link
Contributor

@diaohancai diaohancai commented Nov 8, 2023

Purpose of the PR

Main Changes

The current random walk algorithm requires 2 additional features.

  1. Biased random walk.
  2. Second order random walk.

Add the following parameters:

    private String weightProperty;
    private Double defaultWeight;
    private Double minWeightThreshold;
    private Double maxWeightThreshold;
 
    private Double returnFactor;
    private Double inOutFactor;
  1. String weightProperty. To implement biased random walk. The higher the weight, the higher the probability of walking.
  2. Double defaultWeight. Provide a default value if the weight is null.
  3. Double minWeightThreshold. Truncate when weight is less than the threshold to avoid too small weight.
  4. Double maxWeightThreshold. Truncate when weight exceeds the threshold to avoid overweighting.
  5. Double returnFactor. Controls the probability of re-walk to a previously walked vertex.
  6. Double inOutFactor. Controls whether to walk inward or outward.

For more details about returnFactor and inOutFactor, please refer to the paper《node2vec: scalable feature learning for networks》.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows.
    Unit test: org.apache.hugegraph.computer.algorithm.sampling.RandomWalkTest

Does this PR potentially affect the following parts?

  • Nope
  • Dependencies (add/update license info)
  • Modify configurations
  • The public API
  • Other affects (typed here)

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@diaohancai diaohancai changed the title Feat second order random walk feat(algorithm): support biased second order random walk Nov 8, 2023
Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (ff85e34) 85.03% compared to head (7d6e8ca) 84.99%.
Report is 4 commits behind head on master.

❗ Current head 7d6e8ca differs from pull request most recent head 94bd35b. Consider uploading reports for the commit 94bd35b to get more accurate results

Files Patch % Lines
...egraph/computer/algorithm/sampling/RandomWalk.java 78.86% 11 Missing and 15 partials ⚠️
...hugegraph/computer/core/graph/value/ListValue.java 33.33% 1 Missing and 1 partial ⚠️
...computer/algorithm/sampling/RandomWalkMessage.java 95.65% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #280      +/-   ##
============================================
- Coverage     85.03%   84.99%   -0.04%     
- Complexity     3246     3291      +45     
============================================
  Files           345      349       +4     
  Lines         12298    12472     +174     
  Branches       1102     1129      +27     
============================================
+ Hits          10458    10601     +143     
- Misses         1315     1328      +13     
- Partials        525      543      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@javeme javeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tiny comments

}

// weight threshold truncation
if ((Double) weight.value() < this.minWeightThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call weight.doubleValue() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property value may not be a numeric value.

}

/**
* get edge weight by weight property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Get the weight of a edge by its weight property"

/**
* get edge weight by weight property
*/
private Value getWeight(Edge edge) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer getEdgeWeight()

}
LOG.info("[RandomWalk] algorithm param, {}: {}", OPTION_WALK_LENGTH, walkLength);
LOG.info("[RandomWalk] algorithm param, {}: {}", OPTION_WALK_LENGTH, this.walkLength);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add a common method like logAlgorithmParam(name, value), and just use the this.name() as logged algorithm name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I feel that the logs of algorithm param here should be placed in the framework. I'll remove thoes logs.

@diaohancai
Copy link
Contributor Author

some tiny comments

Thank you for your guidance.

@diaohancai diaohancai requested a review from javeme November 13, 2023 08:58
}

// weight threshold truncation
if ((Double) weight.value() < this.minWeightThreshold) {
DoubleValue weight = (DoubleValue) property;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on a double value like:

  1. double weight = this.defaultWeight;
  2. weight = property..doubleValue() if checked ok
  3. do truncation
  4. return weight

private Double calculateWeight(Id preVertexId, IdList preVertexAdjacenceIdList,
Id nextVertexId, Value weight) {
private Double calculateEdgeWeight(Id preVertexId, IdList preVertexAdjacenceIdList,
Id nextVertexId, double weight) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also keep double finalWeight and return double?

double weight = this.getEdgeWeight(edge);
Double finalWeight = this.calculateEdgeWeight(preVertexId, preVertexAdjacenceIdList,
edge.targetId(), weight);
weightList.add(finalWeight);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a mark here: TODO: improve to avoid OOM

Edge selectedEdge = null;
int randomNum = random.nextInt(edges.size());
private Edge randomSelectEdge(Id preVertexId, IdList preVertexAdjacenceIdList, Edges edges) {
List<Double> weightList = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a mark here: TODO: use primitive array instead, like DoubleArray, in order to reduce memory fragmentation generated during calculations
the same to https://github.com/search?q=repo%3Aapache%2Fincubator-hugegraph-computer+path%3A%2F%5Ecomputer-algorithm%5C%2F%2F++new+ArrayList&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean double[]? I'll submit a new issue for this and try to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome

@diaohancai diaohancai requested a review from javeme November 15, 2023 07:31
Edge selectedEdge = null;
int randomNum = random.nextInt(edges.size());
private Edge randomSelectEdge(Id preVertexId, IdList preVertexAdjacenceIdList, Edges edges) {
List<Double> weightList = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -128,6 +128,8 @@ private static List<Serializable> expressions(
if (filter.size() == 0) {
return PASS;
}
// TODO: use primitive array instead, like DoubleArray,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need to mark lists about algorithms.

}

@Override
protected List<String> value(Vertex vertex) {
IdListList value = vertex.value();
// TODO: use primitive array instead, like DoubleArray,
// in order to reduce memory fragmentation generated during calculations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List<String> doesn't require replacement(because there is no string[] type), and the propValues doesn't look like very large.

@diaohancai diaohancai requested a review from javeme November 20, 2023 08:32
Copy link
Contributor

@javeme javeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THX~

@diaohancai
Copy link
Contributor Author

THX~

Thank you for your guidance.

@imbajin imbajin merged commit 2be0c28 into apache:master Dec 4, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants