My small contribution to the Rewrite it in Rust! bandwagon movement. The original was written in Typescript
This is a Rust command-line interface (CLI) tool that provides four commands for manipulating JSON and NDJSON files: split
, merge
, bundle
, and unbundle
.
Each command can accept input or output from files, directories, or from standard input/output wherever relevant.
After being rewritten in Rust it is roughly 2x as fast as the Typescript implementation and can handle files larger than 512MB.
This is published to crates.io so you can simply do a global install with:
cargo install jsrmx
Then jsrmx
is executable from your shell
jsrmx --help
There are four commands:
merge
- merges multiple JSON objects into a single large JSON objectsplit
- splits a single JSON object into multiple JSON objects by top-level keysbundle
- bundles multiple JSON objects ito an NDJSON (newline-delimited JSON) seriesunbundle
- unbundles an NDJSON series into a collection of separate JSON objects
jsrmx merge <dir> [output]
<dir>
- Required input directory[output]
- Optional output file name (default-
for stdout)
-c
,--compact
- Compact single-line output objects-f
,--filter
- regular expression to filter output keys-p
,--pretty
- Pretty-print output objects (default)-t
,--trim
- File extension to trim from object key names
Given a directory named letters
with six files:
letters/alpha.json
letters/bravo.json
letters/charlie.json
letters/delta.json
letters/echo.json
letters/foxtrot.json
Where each file contains a few properties:
// cat alpha.json
{
"uppercase": "A",
"lowercase": "a",
"position": 1
}
We can merge
all the files into a single file:
jsrmx merge letters/ letters.json
So the contents of letters.json
looks like:
// cat letters.json
{
"alpha": {
"lowercase": "a",
"position": 1,
"uppercase": "A"
},
"bravo": {
"lowercase": "b",
"position": 2,
"uppercase": "B"
},
"charlie": {
"lowercase": "c",
"position": 3,
"uppercase": "C"
},
"delta": {
"lowercase": "d",
"position": 4,
"uppercase": "D"
},
"echo": {
"lowercase": "e",
"position": 5,
"uppercase": "E"
},
"foxtrot": {
"lowercase": "f",
"position": 6,
"uppercase": "F"
}
}
Note the keys get sorted and have the .json
extension trimmed from their names.
jsrmx split [input] [output]
[input]
- Optional input file name or-
forstdin
(default-
)[output]
- Optional output directory or-
forstdout
(default-
)
-c
,--compact
- Compact single-line output objects-f
,--filter
- regular expression to filter output keys-p
,--pretty
- Pretty-print output objects (default)
We can split one file (or object through stdin
) into individually-named files:
jsrmx split letters.json letters/
Given a the following single-object JSON file:
{
"alpha": {
"uppercase": "A",
"lowercase": "a",
"position": 1
},
"bravo": {
"uppercase": "B",
"lowercase": "b",
"position": 2
},
// ... 3 entries omitted
"foxtrot": {
"uppercase": "F",
"lowercase": "f",
"position": 6
}
}
The output files created will be:
letters/alpha.json
letters/bravo.json
letters/charlie.json
letters/delta.json
letters/echo.json
letters/foxtrot.json
Where each file contents will be the value from the large JSON:
// cat alpha.json
{
"uppercase": "A",
"lowercase": "a",
"position": 1
}
If output to stdout
the object will keep the top-level key as a parent object. Using --filter
can extract specific keys.
jsrmx split --filter delta big_object.json -
{
"delta": {
"uppercase": "D",
"lowercase": "d",
"position": 4
}
}
Combined with --compact
this can convert a large object into an .ndjson
file.
jsrmx split --compact big_object.json > letters.ndjson
// cat letters.ndjson
{"foxtrot":{"lowercase":"f","position":6,"uppercase":"F"}}
{"bravo":{"lowercase":"b","position":2,"uppercase":"B"}}
{"charlie":{"lowercase":"c","position":3,"uppercase":"C"}}
{"delta":{"lowercase":"d","position":4,"uppercase":"D"}}
{"echo":{"lowercase":"e","position":5,"uppercase":"E"}}
{"alpha":{"lowercase":"a","position":1,"uppercase":"A"}}
jsrmx bundle <dir> [output]
<dir>
- Required target input directory[output]
- Optional output filename or-
for stdout (default-
)
-e
,--escape
- List of field path to convert from nested JSON to an escaped string
We can convert a directory of .json
files into a single .ndjson
(newline-delimited JSON) file:
jsrmx bundle letters/ letters.ndjson
Given the files in the input directory:
letters/alpha.json
letters/bravo.json
letters/charlie.json
letters/delta.json
letters/echo.json
letters/foxtrot.json
With each file containing:
// cat alpha.json
{
"letter": {
"lowercase": "a",
"uppercase": "A"
},
"name": "alpha",
"position": 1
}
The output letters.ndjson
will contain:
{"name":"alpha","letter":{"uppercase":"A","lowercase":"a"},"position":1}
{"name":"bravo","letter":{"uppercase":"B","lowercase":"b"},"position":2}
{"name":"charlie","letter":{"uppercase":"C","lowercase":"c"},"position":3}
{"name":"delta","letter":{"uppercase":"D","lowercase":"d"},"position":4}
{"name":"echo","letter":{"uppercase":"E","lowercase":"e"},"position":5}
{"name":"foxtrot","letter":{"uppercase":"F","lowercase":"f"},"position":6}
NOTE: the filenames are not retained when bundling
.ndjson
files.
jsrmx unbundle [options] [intput] [output]
[input]
- Optional input file name (default-
for stdin)[output]
- Optional output directory name (default-
for stdout)
-c
,--compact
- Compact single-line output objects-n
,--name
- A list of JSON paths to use for filenames (uses first non-null)-p
,--pretty
- Pretty-print output objects (default)-t
,--type
- A JSON path to use for filename suffix (before extension)-u
,--unescape
- List of field paths to convert from escaped string to nested JSON
Unbundling a file (or stdin
) to a directory (or stdout
):
jsrmx unbundle letters.ndjson letters/
Given the file letters.ndjson
:
{"name":"alpha","letter":{"uppercase":"A","lowercase":"a"},"position":1}
{"name":"bravo","letter":{"uppercase":"B","lowercase":"b"},"position":2}
{"name":"charlie","letter":{"uppercase":"C","lowercase":"c"},"position":3}
{"name":"delta","letter":{"uppercase":"D","lowercase":"d"},"position":4}
{"name":"echo","letter":{"uppercase":"E","lowercase":"e"},"position":5}
{"name":"foxtrot","letter":{"uppercase":"F","lowercase":"f"},"position":6}
Unbundling with:
jsrmx unbundle letters.ndjson letters/
Will create these files:
letters/object-000001.json
letters/object-000002.json
letters/object-000003.json
letters/object-000004.json
letters/object-000005.json
letters/object-000006.json
The contents of each file will be pretty-printed by default:
// cat letters/object-000001.json
{
"name": "alpha",
"letter": {
"uppercase": "A",
"lowercase": "a"
},
"position": 1
}
Using the --compact
option we can keep them as single-line entries:
jsrmx unbundle --compact letters.ndjson letters/
// cat letter/object-000001.json
{"name":"alpha","letter":{"uppercase":"A","lowercase":"a"},"position": 1}
For descriptive filenames, use the --name
option. For a filename made of ${name}.json
run:
jsrmx unbundle --name=name letters.ndjson letters/
Will output ${name}.json
filenames:
letters/alpha.json
letters/bravo.json
letters/charlie.json
letters/delta.json
letters/echo.json
letters/foxtrot.json
Name values will work on nested values as long as the JSON path is .
delimited. Periods in the key names will not resolve properly.
jsrmx unbundle --name=letter.lowercase letters.ndjson letters/
Will output ${letter.lowercase}.json
filenames:
letters/a.json
letters/b.json
letters/c.json
letters/d.json
letters/e.json
letters/f.json