A minimalist command-line utility to pipe documents from a file or I/O stream into an Elasticsearch cluster.
Have you ever had thousands of sample documents in a file, and you just want to load them all into an unsecure local Elasticsearch cluster?
espipe docs.ndjson http://localhost:9200/new_index
And you're done.
The goal of espipe
is to provide the simpliest way to bulk-load a dataset into Elasticsearch. It does not do any document trasnformation or enricment, and only requires the inputs be valid, deserializable JSON objects in a newline-dilemited json (.ndjson
) file.
It is multi-threaded and capable of fully saturating the CPU of the sending host. This could potentially overwhelm the target cluster, so use with caution on large data sets.
Documents are batched into _bulk
requests of 5,000 documents and sent with the create
action. It is not opinionated if the target is an alias, regular index or a data stream; just define your index templates and ingest pipelines in advance.
- Make sure you have
cargo
installed from rust-lang.org - Clone this repository to your local machine
- From the repository directory, run
cargo install --path .
Usage: espipe [OPTIONS] <INPUT> <OUTPUT>
Arguments:
<INPUT> The input URI to read docs from
<OUTPUT> The output URI to send docs to
Options:
-k, --insecure Ignore certificate validation
-a, --apikey <APIKEY> Apikey to authenticate via http header
-u, --username <USERNAME> Username for authentication
-p, --password <PASSWORD> Password for authentication
-q, --quiet Quiet mode, don't print runtime summary
-h, --help Print help
Both the <INPUT>
and <OUTPUT>
arguments are URI-formatted strings.
The input URI can be a:
- A stream from
stdin
:-
- An unqualified file path:
file.ext
,~/dir/file.ext
- A fully-qualified
file://
scheme URI:file:///Users/name/dir/file.ext
The output URI can be:
- A stream to
stdout
:-
- An unqualified file path:
file.ext
,~/dir/file.ext
- A fully-qualified
file://
scheme URI:file:///Users/name/dir/file.ext
- An
http://
orhttps://
scheme URL to an Elasticsearch cluster, including index name:http://example.com/index_name
- A known host saved in the
~/.esdiag/hosts.yml
configuration file:localhost:index_name
When piping to an Elasticsearch output, the index name is required.
All authentication options only apply to an http(s) output.
You may create an ~/.esdiag/hosts.yml
configuration file to much like an ~/.ssh/config
file.
For example, here is a localhost
definition with no authentication:
localhost:
auth: None
url: http://localhost:9200/
This allows you to use localhost
as a shorthand for http://localhost:9200/
. Both commands are equivalent:
espipe docs.ndjson http://localhost:9200/new_index
espipe docs.ndjson localhost:new_index
An Elasticsearch Service (ESS) cluster with API key authentication:
ess-cluster:
auth: Apikey
url: https://ess-cluster.es.us-west-2.aws.found.io/
apikey: "fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="
Enabling you to use the shorthand:
espipe docs.ndjson https://esdiag.es.us-west-2.aws.found.io/new_index --apikey="fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="
espipe docs.ndjson ess-cluster:new_index
If you need detailed logs on what espipe
is doing, you can set the RUST_LOG
environment variable:
export RUST_LOG=debug
espipe docs.ndjson https://esdiag.es.us-west-2.aws.found.io/new_index --apikey="fak34p1k3ydcbcc2c134c3eb3bf967bcf67q=="
-
Define a shell function that finds all
.ndjson
files recursively, callingespipe
on each:function espipe-find() { for file in $(find $1 -name "*.ndjson" ); do echo -n "$file > "; espipe "$file" "$2"; done }
-
The
espipe-find
function with the directory and output target index matching thelogs-*-*
datastream template:espipe-find elastic-agent-123abc http://localhost:9200/logs-agent-default
This ingests all documents into a new datastream called logs-agent-default
making the logs visible in Kibana's logs explorer.