Sample for nebula-ngql data source #72

porscheme · 2023-03-22T01:18:55Z

General Question

Per comment on the application.conf file, data source can be nebula-ngql, can you please provide a sample? I want to try this feature.

Thanks

Below is an extract from the application.conf file

data: {
    # data source. optional of nebula,nebula-ngql,csv,json
    source: csv
    # data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text
    sink: csv
    # if your algorithm needs weight
    hasWeight: false
  }

The text was updated successfully, but these errors were encountered:

wey-gu · 2023-03-22T03:39:30Z

should be like this, @Nicole00 could you help confirm this will work? if so, I could prepare pr for examples in conf file.

data: {
    # data source. optional of nebula,nebula-ngql,csv,json
    source: nebula-ngql
...
  nebula: {
    read: {
        metaAddress: "127.0.0.1:9559"
        graphAddress: "127.0.0.1:9669"
        space: basketballplayer
        labels: ["follow"]
        weightCols: ["degree"]
        ngql: "MATCH ()-[e:follow]->() RETURN e LIMIT 100000"
    }

porscheme · 2023-03-22T04:04:21Z

Thanks @wey-gu for the quick reply.
It looks like nebula-algorithm doesn't work with string VID, can you confirm?
And then I see this, how I convert our string VID to integer using algorithm interface?

For non-integer String data, it is recommended to use the algorithm interface. You can use the dense_rank function of SparkSQL to encode the data as the Long type instead of the String type.

wey-gu · 2023-03-22T04:10:13Z

Actually, it now supports to do the numerical vid generation and auto-mapping, just add encodeId:true to the algo config, see #68

porscheme · 2023-03-22T04:17:12Z

You mean like below?

  algorithm: {
    executeAlgo: node2vec
    node2vec:{
      encodeId:true
       maxIter: 5,
       lr: 0.025,
       dataNumPartition: 15,
       modelNumPartition: 10,
       dim: 9,
       window: 2,
       walkLength: 4,
       numWalks: 10,
       p: 05,
       q: 0.5,
       directed: false,
       degree: 2,
       embSeparate: ",",
       modelPath: "/mnt/data/sparkdata/word2vec"
    }
  }

wey-gu · 2023-03-22T04:46:01Z

You mean like below?

  algorithm: {
    executeAlgo: node2vec
    node2vec:{
      encodeId:true
       maxIter: 5,
       lr: 0.025,
       dataNumPartition: 15,
       modelNumPartition: 10,
       dim: 9,
       window: 2,
       walkLength: 4,
       numWalks: 10,
       p: 05,
       q: 0.5,
       directed: false,
       degree: 2,
       embSeparate: ",",
       modelPath: "/mnt/data/sparkdata/word2vec"
    }
  }

Yes

porscheme · 2023-03-22T04:49:14Z

Yes

I'm getting this error, not sure why?
Below "0033af94-95f2-ec6d-ac72-f75f4d00622a" is a VID

{"level":"WARN","timestamp":"2023-03-22 04:43:17,806","thread":"main","message":"The jar local:///mnt/spark/work/nebula-algorithm-3.0-SNAPSHOT.jar has been added already. Overwriting of added jars is not supported in the current version."}
{"level":"WARN","timestamp":"2023-03-22 04:43:18,145","thread":"main","message":"returnCols is empty and your result will contain all properties for HAS_CONDITION"}
{"level":"WARN","timestamp":"2023-03-22 04:43:20,948","thread":"Executor task launch worker for task 0","message":"Putting block rdd_6_0 failed due to exception java.lang.NumberFormatException: For input string: "0033af94-95f2-ec6d-ac72-f75f4d00622a"."}
{"level":"WARN","timestamp":"2023-03-22 04:43:20,949","thread":"Executor task launch worker for task 0","message":"Block rdd_6_0 could not be removed as it was not found on disk or in memory"}
{"level":"ERROR","timestamp":"2023-03-22 04:43:20,959","thread":"Executor task launch worker for task 0","message":"Exception in task 0.0 in stage 0.0 (TID 0)"}
java.lang.NumberFormatException: For input string: "0033af94-95f2-ec6d-ac72-f75f4d00622a"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Long.parseLong(Long.java:589)
	at java.lang.Long.parseLong(Long.java:631)
	at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277)
	at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
	at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:29)
	at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:25)
	at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
	at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
	at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
	at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
	at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:50)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

wey-gu · 2023-03-22T05:08:58Z

@Nicole00 I think the encodeId:true for the main entry of nebula-algorithm is supported, or it's actually not?

wey-gu · 2023-03-22T05:09:25Z

And @porscheme you are using the latest version of nebula-algo, right?

porscheme · 2023-03-22T05:11:12Z

And @porscheme you are using the latest version of nebula-algo, right?

I cloned https://github.com/vesoft-inc/nebula-algorithm few hours ago. Therefore, I'm using latest.

wey-gu · 2023-03-22T05:17:00Z

oh, now I know, the node2vec is not yet supported for the encodeId, you have to do it yourself to map vid to int for now.

QingZ11 · 2023-05-05T03:29:54Z

@porscheme Hi, same to the previous issue you created, this issue has been closed due to a lack of updates for a long time. If you have any updates, it's OK to reopen it.

Again, thanks a lot for your contribution anyway 😊

wey-gu added the doc affected Solution: improvements or additions to documentation label Mar 22, 2023

wey-gu mentioned this issue Mar 22, 2023

docs regarding recent spark utils update(new features) vesoft-inc/nebula-docs-cn#2671

Open

wey-gu mentioned this issue Mar 25, 2023

Weekly Report 2023-03-24 vesoft-inc/nebula-community#393

Closed

QingZ11 closed this as completed May 5, 2023

wey-gu mentioned this issue May 6, 2023

Weekly Report 2023-05-05 vesoft-inc/nebula-community#400

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample for nebula-ngql data source #72

Sample for nebula-ngql data source #72

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023 •

edited

Loading

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023 •

edited

Loading

wey-gu commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

QingZ11 commented May 5, 2023

Sample for nebula-ngql data source #72

Sample for nebula-ngql data source #72

Comments

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023 • edited Loading

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023 • edited Loading

wey-gu commented Mar 22, 2023

wey-gu commented Mar 22, 2023

porscheme commented Mar 22, 2023

wey-gu commented Mar 22, 2023

QingZ11 commented May 5, 2023

porscheme commented Mar 22, 2023 •

edited

Loading

porscheme commented Mar 22, 2023 •

edited

Loading