Main motivation for this tool is to convert traditional text-based (and line-based) logs to JSON for programs which do not support JSON logs themselves. It can be used in online manner (pipelining output of the program into regex2json, e.g., as log processor in runit and dinit init systems) or offline manner (to process logs stored in files). But the tool is more general and can enable any workflow where you prefer operating on JSON instead of text. It works especially great when combined with jq.
Features:
- Reads stdin line by line, converting each line to JSON to stdout.
- Supports transformations of matched capture groups by specifying the transformation as capture group's name.
- Transformation consists of a series of operators (e.g., parsing numbers, timestamps, creating arrays and objects).
- Supports regexp matching a line multiple times, combining all matches into one JSON.
Releases page contains a list of stable versions. Each includes:
- Statically compiled binaries.
- Docker images.
You should just download/use the latest one.
The tool is implemented in Go. You can also use go install
to install the latest stable (released) version:
go install gitlab.com/tozd/regex2json/cmd/regex2json@latest
To install the latest development version (main
branch):
go install gitlab.com/tozd/regex2json/cmd/regex2json@main
regex2json reads lines from stdin, matching every line with the provided regexp. If line matches, values from captured named groups are mapped into output JSON which is then written out to stdout. If the line does not match, it is written to stderr.
Capture groups' names are compiled into Expressions and describe how are matched values mapped and transformed into output JSON. See Expression for details on the syntax and Library for available operators.
Any error (e.g., a failed expression) is logged to stderr while the rest of the output JSON is still written out.
If regexp can match multiple times per line, all matches are combined together into the same one JSON output per line.
Usage:
regex2json <regexp>
Example:
$ while true; do LC_ALL=C date; sleep 1; done | regex2json "(?P<date___time__UnixDate__RFC3339>.+)"
{"date":"2023-06-13T11:26:45Z"}
{"date":"2023-06-13T11:26:46Z"}
{"date":"2023-06-13T11:26:47Z"}
Example:
$ echo '192.168.0.100 - - [13/Jun/2023:13:15:13 +0000] "GET /index.html HTTP/1.1" 200 1234 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"' | \
regex2json '^(?P<address>\S+) - (?P<user>\S+) \[(?P<time___time__Nginx__RFC3339>[\w:/]+\s[+\-]\d{4})\] "(?P<method>\S+)\s?(?P<url>\S+)?\s?(?P<http>\S+)?" (?P<status___int>\d{3}) (?:(?P<size___int>\d+)|-) "(?P<referrer>[^"]*)" "(?P<agent>[^"]*)"'
{"address":"192.168.0.100","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36","http":"HTTP/1.1","method":"GET","referrer":"-","size":1234,"status":200,"time":"2023-06-13T13:15:13Z","url":"/index.html","user":"-"}
This is also a Go package. You can add it to your project using go get
:
go get gitlab.com/tozd/regex2json
It requires Go 1.20 or newer.
See full package documentation on pkg.go.dev on using regex2json as a Go package.
Feel free to make a merge-request add more time layouts and/or operators.
regex2json is implemented in Go and uses its standard regexp package for parsing and compiling regular expressions.
This is a consequence of the limitation on which characters can be in a capture group name in Go
([A-Za-z0-9_]+
).
See this issue for more details.
- jc – jc enables the same idea of converting text-based output of programs into JSON, but its focus is to support popular programs out of the box. regex2json enables quick transformations by providing a regexp with expressions how captured groups are transformed into JSON.
There is also a read-only GitHub mirror available, if you need to fork the project there.