Spark sstfile generator #420

spacewalkman · 2019-05-22T14:14:08Z

Reopen a new PR after this repo changes from private to public, replacing PR#208

A spark job which does the following things:

parsing an input mapping file to map a hive table to a tag/edge, in which the table's PK(logically) should be identified
use nebula native client to encode a tag's key and values
define a custom hadoop OutputFormat and RecordWriter, which should generate a sub dir for one partition per worker in specified sst file output dir

nebula-community-bot · 2019-05-22T14:15:04Z

Can one of the admins verify this patch?

src/tools/spark-sstfile-generator/README.md

sherman-the-tank · 2019-05-22T15:12:26Z

jenkins go

nebula-community-bot · 2019-05-22T15:15:38Z

Unit testing failed.

spacewalkman · 2019-05-23T01:20:21Z

CI failure seems to related to JNI header, repushed please let jenkins go

dangleptr · 2019-05-23T02:45:54Z

Jenkins go

nebula-community-bot · 2019-05-23T02:48:54Z

Unit testing failed.

dangleptr · 2019-05-24T02:15:18Z

Jenkins go

nebula-community-bot · 2019-05-24T02:25:35Z

Unit testing failed.

dangleptr · 2019-05-27T06:14:38Z

Jenkins go

nebula-community-bot · 2019-05-27T06:15:57Z

Unit testing failed.

dangleptr · 2019-05-29T03:13:57Z

Is the pr ready now? @spacewalkman

spacewalkman · 2019-05-29T13:21:09Z

@dangleptr There are some specific data skewness problem causing OOM, need to analysis input data.

dangleptr · 2019-06-16T00:58:31Z

The pr is ready now? @spacewalkman

spacewalkman · 2019-06-16T01:20:24Z

Yes.It's ready now.

dangleptr · 2019-06-17T02:28:34Z

Jenkins go

nebula-community-bot · 2019-06-17T02:40:58Z

Unit testing failed.

dangleptr · 2019-06-17T03:53:13Z

Jenkins go

nebula-community-bot · 2019-06-17T04:04:22Z

Unit testing failed.

…mLocal

…n but no other column

…refactor key-value type name

…lgin

…ove redundant COMPILE and enable test in the meantime

…urefire plugin 3.0.0-M2

spacewalkman · 2019-06-27T07:54:58Z

Jenkins, go

nebula-community-bot · 2019-06-27T08:08:46Z

Unit testing passed.

nebula-community-bot · 2019-06-27T08:32:22Z

Unit testing passed.

src/tools/native-client/src/main/java/com/vesoft/client/NativeClient.java

src/tools/spark-sstfile-generator/build.sbt

nebula-community-bot · 2019-06-27T08:37:35Z

Unit testing passed.

* [WIP] add spark sst file generator job to generate sst files per partiton per woker * add edge's from and to column reference in mapping file, do check those columns exist * make sure vertex and its outbound edges are in the same partition * add native client unit test * manual boxing AnyVal to AnyRef in order to call NativeCLient.encoded ,for that scala has no autoboxing feature like java * support hive table with date and other partitin columns * fix double free exception * remove all rockdbjni related dependency * use repartitionAndSortWithinPartitions to avoid overlapping sst files key range, update dependency to hadoop 2..7.4 * add mapping file and command line reference, handle mapping load problem * address comments * remove duplicate cmake instruction to find JNI header * fix doc inconsistance * keep all edges to a single edgeType * fix flaky UT * add mapping json schema file and example mapping file * use hdfs -copyFromLocal to put local sst files to HDFS * create destination hdfs dir to put sst files before run hdfs -copyFromLocal * refactor and fix bug when vertex table has only one primary key column but no other column * edge_type encoded as a property and, clean up local sst file dir and refactor key-value type name * create parent dir first before creating local sst files * set java.library.path env variable before run UT in maven surefire pulgin * files generated suffix with .sst * COMPILE phase precede PACKAGE phase in default maven lifecycle,so remove redundant COMPILE and enable test in the meantime * fix build failure caused by imcompatability between maven 3.0.5 and surefire plugin 3.0.0-M2 * add some clearfication about sst file name uniqueness in doc

Co-authored-by: shylock <[email protected]>

spacewalkman commented May 22, 2019

View reviewed changes

src/tools/spark-sstfile-generator/README.md Outdated Show resolved Hide resolved

spacewalkman mentioned this pull request May 22, 2019

add spark sst file generator job to generate sst files #208

Closed

sherman-the-tank added the ready-for-testing PR: ready for the CI test label May 22, 2019

spacewalkman force-pushed the spark-sstfile-generator branch 3 times, most recently from bf1f3d8 to 8f663e9 Compare May 24, 2019 01:51

spacewalkman force-pushed the spark-sstfile-generator branch 2 times, most recently from 1b2041d to 3c6955a Compare May 24, 2019 07:25

dangleptr requested review from darionyaphet and dangleptr May 27, 2019 06:14

spacewalkman force-pushed the spark-sstfile-generator branch from 3c6955a to 6af86d4 Compare May 31, 2019 03:42

spacewalkman force-pushed the spark-sstfile-generator branch from 6af86d4 to 78cd2b3 Compare June 19, 2019 02:14

qianyong added 16 commits June 27, 2019 15:43

address comments

4749f81

remove duplicate cmake instruction to find JNI header

3f549ce

fix doc inconsistance

c2c4cbb

keep all edges to a single edgeType

a8ab4f9

fix flaky UT

4031d00

add mapping json schema file and example mapping file

73284da

use hdfs -copyFromLocal to put local sst files to HDFS

bbf5d60

create destination hdfs dir to put sst files before run hdfs -copyFro…

54c8d27

…mLocal

refactor and fix bug when vertex table has only one primary key colum…

051497f

…n but no other column

edge_type encoded as a property and, clean up local sst file dir and …

bbc1984

…refactor key-value type name

create parent dir first before creating local sst files

6429ad9

set java.library.path env variable before run UT in maven surefire pu…

d5b0605

…lgin

files generated suffix with .sst

aecd778

COMPILE phase precede PACKAGE phase in default maven lifecycle,so rem…

ee3d36b

…ove redundant COMPILE and enable test in the meantime

fix build failure caused by imcompatability between maven 3.0.5 and s…

da74433

…urefire plugin 3.0.0-M2

add some clearfication about sst file name uniqueness in doc

a63abbf

spacewalkman dismissed stale reviews from dutor and dangleptr via a63abbf June 27, 2019 07:53

spacewalkman force-pushed the spark-sstfile-generator branch from bc677f7 to a63abbf Compare June 27, 2019 07:53

dangleptr approved these changes Jun 27, 2019

View reviewed changes

darionyaphet reviewed Jun 27, 2019

View reviewed changes

src/tools/native-client/src/main/java/com/vesoft/client/NativeClient.java Show resolved Hide resolved

darionyaphet reviewed Jun 27, 2019

View reviewed changes

src/tools/spark-sstfile-generator/build.sbt Show resolved Hide resolved

darionyaphet approved these changes Jun 27, 2019

View reviewed changes

dangleptr merged commit 34eb36d into vesoft-inc:master Jun 27, 2019

yixinglu pushed a commit to yixinglu/nebula that referenced this pull request Mar 21, 2022

Disable the bidirection edges (vesoft-inc#420)

2e9018f

Co-authored-by: shylock <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark sstfile generator #420

Spark sstfile generator #420

spacewalkman commented May 22, 2019

nebula-community-bot commented May 22, 2019

sherman-the-tank commented May 22, 2019

nebula-community-bot commented May 22, 2019

spacewalkman commented May 23, 2019

dangleptr commented May 23, 2019

nebula-community-bot commented May 23, 2019

dangleptr commented May 24, 2019

nebula-community-bot commented May 24, 2019

dangleptr commented May 27, 2019

nebula-community-bot commented May 27, 2019

dangleptr commented May 29, 2019

spacewalkman commented May 29, 2019

dangleptr commented Jun 16, 2019

spacewalkman commented Jun 16, 2019

dangleptr commented Jun 17, 2019

nebula-community-bot commented Jun 17, 2019

dangleptr commented Jun 17, 2019

nebula-community-bot commented Jun 17, 2019

spacewalkman commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019

Spark sstfile generator #420

Spark sstfile generator #420

Conversation

spacewalkman commented May 22, 2019

nebula-community-bot commented May 22, 2019

sherman-the-tank commented May 22, 2019

nebula-community-bot commented May 22, 2019

spacewalkman commented May 23, 2019

dangleptr commented May 23, 2019

nebula-community-bot commented May 23, 2019

dangleptr commented May 24, 2019

nebula-community-bot commented May 24, 2019

dangleptr commented May 27, 2019

nebula-community-bot commented May 27, 2019

dangleptr commented May 29, 2019

spacewalkman commented May 29, 2019

dangleptr commented Jun 16, 2019

spacewalkman commented Jun 16, 2019

dangleptr commented Jun 17, 2019

nebula-community-bot commented Jun 17, 2019

dangleptr commented Jun 17, 2019

nebula-community-bot commented Jun 17, 2019

spacewalkman commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019

nebula-community-bot commented Jun 27, 2019