-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark sstfile generator #420
Spark sstfile generator #420
Conversation
Can one of the admins verify this patch? |
jenkins go |
Unit testing failed. |
CI failure seems to related to JNI header, repushed please let jenkins go |
Jenkins go |
Unit testing failed. |
bf1f3d8
to
8f663e9
Compare
Jenkins go |
Unit testing failed. |
1b2041d
to
3c6955a
Compare
Jenkins go |
Unit testing failed. |
Is the pr ready now? @spacewalkman |
@dangleptr There are some specific data skewness problem causing OOM, need to analysis input data. |
3c6955a
to
6af86d4
Compare
The pr is ready now? @spacewalkman |
Yes.It's ready now. |
Jenkins go |
Unit testing failed. |
Jenkins go |
Unit testing failed. |
6af86d4
to
78cd2b3
Compare
…n but no other column
…refactor key-value type name
…ove redundant COMPILE and enable test in the meantime
…urefire plugin 3.0.0-M2
bc677f7
to
a63abbf
Compare
Jenkins, go |
Unit testing passed. |
Unit testing passed. |
src/tools/native-client/src/main/java/com/vesoft/client/NativeClient.java
Show resolved
Hide resolved
Unit testing passed. |
* [WIP] add spark sst file generator job to generate sst files per partiton per woker * add edge's from and to column reference in mapping file, do check those columns exist * make sure vertex and its outbound edges are in the same partition * add native client unit test * manual boxing AnyVal to AnyRef in order to call NativeCLient.encoded ,for that scala has no autoboxing feature like java * support hive table with date and other partitin columns * fix double free exception * remove all rockdbjni related dependency * use repartitionAndSortWithinPartitions to avoid overlapping sst files key range, update dependency to hadoop 2..7.4 * add mapping file and command line reference, handle mapping load problem * address comments * remove duplicate cmake instruction to find JNI header * fix doc inconsistance * keep all edges to a single edgeType * fix flaky UT * add mapping json schema file and example mapping file * use hdfs -copyFromLocal to put local sst files to HDFS * create destination hdfs dir to put sst files before run hdfs -copyFromLocal * refactor and fix bug when vertex table has only one primary key column but no other column * edge_type encoded as a property and, clean up local sst file dir and refactor key-value type name * create parent dir first before creating local sst files * set java.library.path env variable before run UT in maven surefire pulgin * files generated suffix with .sst * COMPILE phase precede PACKAGE phase in default maven lifecycle,so remove redundant COMPILE and enable test in the meantime * fix build failure caused by imcompatability between maven 3.0.5 and surefire plugin 3.0.0-M2 * add some clearfication about sst file name uniqueness in doc
Co-authored-by: shylock <[email protected]>
Reopen a new PR after this repo changes from private to public, replacing PR#208
A spark job which does the following things:
parsing an input mapping file to map a hive table to a tag/edge, in which the table's PK(logically) should be identified
use nebula native client to encode a tag's key and values
define a custom hadoop OutputFormat and RecordWriter, which should generate a sub dir for one partition per worker in specified sst file output dir