NAME
neptune-export.sh export-pg - Export property graph from Neptune to CSV
or JSON.
SYNOPSIS
neptune-export.sh export-pg
[ --alb-endpoint <applicationLoadBalancerEndpoint> ]
[ --approx-edge-count <approxEdgeCount> ]
[ --approx-node-count <approxNodeCount> ]
[ {-b | --batch-size} <batchSize> ]
[ {-c | --config-file | --filter-config-file} <configFile> ]
[ --clone-cluster ]
[ --clone-cluster-correlation-id <cloneCorrelationId> ]
[ --clone-cluster-enable-audit-logs ]
[ --clone-cluster-instance-type <cloneClusterInstanceType> ]
[ --clone-cluster-replica-count <replicaCount> ]
[ {--cluster-id | --cluster | --clusterid} <clusterId> ]
[ {-cn | --concurrency} <concurrency> ]
[ {--config | --filter} <configJson> ] {-d | --dir} <directory>
[ --disable-ssl ] [ {-e | --endpoint} <endpoint>... ]
[ --edge-label-strategy <edgeLabelStrategy> ]
[ {-el | --edge-label} <edgeLabels>... ]
[ --escape-csv-headers ] [ --escape-newline ]
[ --exclude-type-definitions ] [ --export-id <exportId> ]
[ --filter-edges-early ] [ --format <format> ]
[ --gremlin-edge-filter <gremlinEdgeFilter> ]
[ --gremlin-filter <gremlinFilter> ]
[ --gremlin-node-filter <gremlinNodeFilter> ]
[ --include-last-event-id ] [ --janus ]
[ --lb-port <loadBalancerPort> ] [ --limit <limit> ]
[ --log-level <log level> ]
[ --max-content-length <maxContentLength> ] [ --merge-files ]
[ --multi-value-separator <multiValueSeparator> ]
[ {-nl | --node-label} <nodeLabels>... ]
[ --nlb-endpoint <networkLoadBalancerEndpoint> ]
[ {-o | --output} <output> ] [ {-p | --port} <port> ]
[ --partition-directories <partitionDirectories> ]
[ --per-label-directories ] [ --profile <profiles>... ]
[ {-r | --range | --range-size} <rangeSize> ]
[ {--region | --stream-region} <region> ]
[ {-s | --scope} <scope> ] [ --serializer <serializer> ]
[ --skip <skip> ]
[ --stream-large-record-strategy <largeStreamRecordHandlingStrategy> ]
[ --stream-name <streamName> ] [ --stream-role-arn <streamRoleArn> ]
[ --stream-role-external-id <streamRoleExternalId> ]
[ --stream-role-session-name <streamRoleSessionName> ] [ --strict-cardinality ]
[ {-t | --tag} <tag> ] [ --token-prefix <tokenPrefix> ]
[ --tokens-only <tokensOnly> ] [ --use-iam-auth ] [ --use-ssl ]
OPTIONS
--alb-endpoint <applicationLoadBalancerEndpoint>
Application load balancer endpoint (optional: use only if
connecting to an IAM DB enabled Neptune cluster through an
application load balancer (ALB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-application-load-balancer).
This option may occur a maximum of 1 times
This option is part of the group 'load-balancer' from which only
one option may be specified
--approx-edge-count <approxEdgeCount>
Approximate number of edges in the graph.
This option may occur a maximum of 1 times
--approx-node-count <approxNodeCount>
Approximate number of nodes in the graph.
This option may occur a maximum of 1 times
-b <batchSize>, --batch-size <batchSize>
Batch size (optional, default 64). Reduce this number if your
queries trigger CorruptedFrameExceptions.
This option may occur a maximum of 1 times
-c <configFile>, --config-file <configFile>, --filter-config-file
<configFile>
Path to JSON schema config file (file path, or 'https' or 's3'
URI).
This option is part of the group 'configFile or config' from which
only one option may be specified
--clone-cluster
Clone an Amazon Neptune cluster.
This option may occur a maximum of 1 times
--clone-cluster-correlation-id <cloneCorrelationId>
Correlation ID to be added to a correlation-id tag on the cloned
cluster.
This option may occur a maximum of 1 times
--clone-cluster-enable-audit-logs
Enables audit logging on the cloned cluster
This option may occur a maximum of 1 times
--clone-cluster-instance-type <cloneClusterInstanceType>
Instance type for cloned cluster (by default neptune-export will
use the same instance type as the source cluster).
This options value is restricted to the following set of values:
db.r4.large
db.r4.xlarge
db.r4.2xlarge
db.r4.4xlarge
db.r4.8xlarge
db.r5.large
db.r5.xlarge
db.r5.2xlarge
db.r5.4xlarge
db.r5.8xlarge
db.r5.12xlarge
db.r5.16xlarge
db.r5.24xlarge
db.r5d.large
db.r5d.xlarge
db.r5d.2xlarge
db.r5d.4xlarge
db.r5d.8xlarge
db.r5d.12xlarge
db.r5d.16xlarge
db.r5d.24xlarge
db.r6g.large
db.r6g.xlarge
db.r6g.2xlarge
db.r6g.4xlarge
db.r6g.8xlarge
db.r6g.12xlarge
db.r6g.16xlarge
db.x2g.large
db.x2g.xlarge
db.x2g.2xlarge
db.x2g.4xlarge
db.x2g.8xlarge
db.x2g.12xlarge
db.x2g.16xlarge
db.t3.medium
db.t4g.medium
r4.large
r4.xlarge
r4.2xlarge
r4.4xlarge
r4.8xlarge
r5.large
r5.xlarge
r5.2xlarge
r5.4xlarge
r5.8xlarge
r5.12xlarge
r5.16xlarge
r5.24xlarge
r5d.large
r5d.xlarge
r5d.2xlarge
r5d.4xlarge
r5d.8xlarge
r5d.12xlarge
r5d.16xlarge
r5d.24xlarge
r6g.large
r6g.xlarge
r6g.2xlarge
r6g.4xlarge
r6g.8xlarge
r6g.12xlarge
r6g.16xlarge
x2g.large
x2g.xlarge
x2g.2xlarge
x2g.4xlarge
x2g.8xlarge
x2g.12xlarge
x2g.16xlarge
t3.medium
t4g.medium
This option may occur a maximum of 1 times
--clone-cluster-replica-count <replicaCount>
Number of read replicas to add to the cloned cluster (default, 0).
This option may occur a maximum of 1 times
This options value must fall in the following range: 0 <= value <= 15
--cluster-id <clusterId>, --cluster <clusterId>, --clusterid
<clusterId>
ID of an Amazon Neptune cluster. If you specify a cluster ID,
neptune-export will use all of the instance endpoints in the
cluster in addition to any endpoints you have specified using the
endpoint options.
This option may occur a maximum of 1 times
This option is part of the group 'endpoint or clusterId' from which
at least one option must be specified
-cn <concurrency>, --concurrency <concurrency>
Concurrency – the number of parallel queries used to run the export
(optional, default 4).
This option may occur a maximum of 1 times
--config <configJson>, --filter <configJson>
JSON schema for property graph.
This option is part of the group 'configFile or config' from which
only one option may be specified
-d <directory>, --dir <directory>
Root directory for output.
This option may occur a maximum of 1 times
This options value must be a path to a directory. The provided path
must be readable and writable.
--disable-ssl
Disables connectivity over SSL.
This option may occur a maximum of 1 times
-e <endpoint>, --endpoint <endpoint>
Neptune endpoint(s) – supply multiple instance endpoints if you
want to load balance requests across a cluster.
This option is part of the group 'endpoint or clusterId' from which
at least one option must be specified
--edge-label-strategy <edgeLabelStrategy>
Export edges by their edge labels, or by a combination of their
start vertex label, edge label, and end vertex label (optional,
default 'edgeLabelsOnly').
This options value is restricted to the following set of values:
edgeLabelsOnly
edgeAndVertexLabels
This option may occur a maximum of 1 times
-el <edgeLabels>, --edge-label <edgeLabels>
Labels of edges to be included in config (optional, default all
labels).
--escape-csv-headers
Escape characters in CSV column headers (optional, default
'false').
This option may occur a maximum of 1 times
--escape-newline
Escape newline characters in CSV files (optional, default 'false').
This option may occur a maximum of 1 times
--exclude-type-definitions
Exclude type definitions from CSV column headers (optional, default
'false').
This option may occur a maximum of 1 times
--export-id <exportId>
Export id
This option may occur a maximum of 1 times
--filter-edges-early
Forces gremlinFilters to apply before the range() step which breaks up
concurrent traversals. This may lead to improved performance in cases where the
gremlinFilters are efficient and filter out the majority of edges.
--format <format>
Output format (optional, default 'csv').
This options value is restricted to the following set of values:
json
csv
csvNoHeaders
neptuneStreamsJson
neptuneStreamsSimpleJson
This option may occur a maximum of 1 times
--gremlin-edge-filter <gremlinEdgeFilter>
Gremlin steps for filtering edges (overrides --gremlin-filter).
This option may occur a maximum of 1 times
--gremlin-filter <gremlinFilter>
Gremlin steps for filtering nodes and edges.
This option may occur a maximum of 1 times
--gremlin-node-filter <gremlinNodeFilter>
Gremlin steps for filtering nodes (overrides --gremlin-filter).
This option may occur a maximum of 1 times
--include-last-event-id
Get the last event ID from the Amazon Neptune stream, if enabled,
and save it to a JSON file (optional, default 'false').
This option may occur a maximum of 1 times
--janus
Use JanusGraph serializer.
This option may occur a maximum of 1 times
--lb-port <loadBalancerPort>
Load balancer port (optional, default 80).
This option may occur a maximum of 1 times
This options value represents a port and must fall in one of the
following port ranges: 1-1023, 1024-49151
--limit <limit>
Maximum number of items to export (optional).
This option may occur a maximum of 1 times
--log-level <log level>
Log level (optional, default 'error').
This options value is restricted to the following set of values:
trace
debug
info
warn
error
This option may occur a maximum of 1 times
--max-content-length <maxContentLength>
Max content length (optional, default 50000000).
This option may occur a maximum of 1 times
--merge-files
Merge files for each vertex or edge label (currently only supports
CSV files for export-pg).
This option may occur a maximum of 1 times
--multi-value-separator <multiValueSeparator>
Separator for multi-value properties in CSV output (optional,
default ';').
This option may occur a maximum of 1 times
-nl <nodeLabels>, --node-label <nodeLabels>
Labels of nodes to be included in config (optional, default all
labels).
--nlb-endpoint <networkLoadBalancerEndpoint>
Network load balancer endpoint (optional: use only if connecting to
an IAM DB enabled Neptune cluster through a network load balancer
(NLB) – see https://github.com/aws-samples/aws-dbs-refarch-graph/tree/master/src/connecting-using-a-load-balancer#connecting-to-amazon-neptune-from-clients-outside-the-neptune-vpc-using-aws-network-load-balancer).
This option may occur a maximum of 1 times
This option is part of the group 'load-balancer' from which only
one option may be specified
-o <output>, --output <output>
Output target (optional, default 'file').
This options value is restricted to the following set of values:
files
stdout
devnull
stream
This option may occur a maximum of 1 times
-p <port>, --port <port>
Neptune port (optional, default 8182).
This option may occur a maximum of 1 times
This options value represents a port and must fall in one of the
following port ranges: 1-1023, 1024-49151
--partition-directories <partitionDirectories>
Partition directory path (e.g. 'year=2021/month=07/day=21').
This option may occur a maximum of 1 times
--per-label-directories
Create a subdirectory for each distinct vertex or edge label.
This option may occur a maximum of 1 times
--profile <profiles>
Name of an export profile.
-r <rangeSize>, --range <rangeSize>, --range-size <rangeSize>
Number of items to fetch per request (optional).
This option may occur a maximum of 1 times
--region <region>, --stream-region <region>
AWS Region in which your Amazon Kinesis Data Stream is located.
This option may occur a maximum of 1 times
-s <scope>, --scope <scope>
Scope (optional, default 'all').
This options value is restricted to the following set of values:
all
nodes
edges
This option may occur a maximum of 1 times
--serializer <serializer>
Message serializer – (optional, default 'GRAPHBINARY_V1D0').
This options value is restricted to the following set of values:
GRAPHSON
GRAPHSON_V1D0
GRAPHSON_V2D0
GRAPHSON_V3D0
GRAPHBINARY_V1D0
GRYO_V1D0
GRYO_V3D0
GRYO_LITE_V1D0
This option may occur a maximum of 1 times
--skip <skip>
Number of items to skip (optional).
This option may occur a maximum of 1 times
--stream-large-record-strategy <largeStreamRecordHandlingStrategy>
Strategy for dealing with records to be sent to Amazon Kinesis that
are larger than 1 MB.
This options value is restricted to the following set of values:
dropAll
splitAndDrop
splitAndShred
This option may occur a maximum of 1 times
--stream-name <streamName>
Name of an Amazon Kinesis Data Stream.
This option may occur a maximum of 1 times
--stream-role-arn <streamRoleArn>
Role to be assumed when uploading results to an Amazon Kinesis Data Stream.
If this options is unused, upload to Kinesis will use credentials found by
the DefaultAWSCredentialsProviderChain.
This option may occur a maximum of 1 times
--stream-role-external-id <streamRoleExternalId>
External Id to be used when assuming the role defined by --stream-role-arn
This option may occur a maximum of 1 times
--stream-role-session-name <streamRoleSessionName>
Session name to be used when assuming the role defined by --stream-role-arn
This option may occur a maximum of 1 times
--strict-cardinality
Format all set and list cardinality properties as arrays in JSON,
including properties with a single value (optional, default
'false').
This option may occur a maximum of 1 times
-t <tag>, --tag <tag>
Directory prefix (optional).
This option may occur a maximum of 1 times
--token-prefix <tokenPrefix>
Token prefix (optional, default '~').
This option may occur a maximum of 1 times
--tokens-only <tokensOnly>
Export tokens (~id, ~label, ~from, ~to) only (optional, default
'off').
This options value is restricted to the following set of values:
off
nodes
edges
both
This option may occur a maximum of 1 times
--use-iam-auth
Use IAM database authentication to authenticate to Neptune
(remember to set the SERVICE_REGION environment variable).
This option may occur a maximum of 1 times
--use-ssl
Enables connectivity over SSL. This option is
deprecated: neptune-export will always connect via SSL unless you
use --disable-ssl to explicitly disable connectivity over SSL.
This option may occur a maximum of 1 times
EXAMPLES
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output
Export all data to the /home/ec2-user/output directory
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output --format json
Export all data to the /home/ec2-user/output directory as JSON
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -s nodes
Export only nodes to the /home/ec2-user/output directory
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -nl User -el FOLLOWS
Export only User nodes and FOLLOWS relationships
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -cn 2
Parallel export using 2 threads
bin/neptune-export.sh export-pg -e neptunedbcluster-xxxxxxxxxxxx.cluster-yyyyyyyyyyyy.us-east-1.neptune.amazonaws.com -d /home/ec2-user/output -cn 2 -r 1000
Parallel export using 2 threads, with each thread processing
batches of 1000 nodes or edges