GitHub - dimitarvp/json-log-histogram-rust: Rust take-home assignment given to me by an interviewer

Introduction

This is a Rust command line tool that calculates a histogram of the separate types of JSON records in an input JSON log file (one JSON object per line).

A sample input file would be:

{"type":"B","foo":"bar","items":["one","two"]}
{"type": "A","foo": 4.0  }
{"type": "B","bar": "abcd"}

The output histogram would report a count of 2 for type B and 1 for type A. It would also report total of 73 bytes for type B and 26 for type A.

How to compile and use

Git clone:

git clone https://github.com/dimitarvp/json-log-histogram-rust.git
cd json-log-histogram-rust

Compile:

RUSTFLAGS="-C target-cpu=native" cargo build --release

To test, generate a JSON log file and supply it as a command-line parameter:

./target/release/jlh -f /path/to/json/log/file

The tool prints an aligned text table and a total runtime at the bottom.

Benchmarks

CPU	File size	Time in seconds
Xeon W-2150B @ 3.00GHz	1MB	0.11091947
Xeon W-2150B @ 3.00GHz	10MB	0.62043929
Xeon W-2150B @ 3.00GHz	100MB	0.643637170
Xeon W-2150B @ 3.00GHz	1000MB	5.175781744
i7-4870HQ @ 2.50GHz	1MB	0.07234297
i7-4870HQ @ 2.50GHz	10MB	0.68889124
i7-4870HQ @ 2.50GHz	100MB	0.670027735
i7-4870HQ @ 2.50GHz	1000MB	6.659739416
i3-3217U @ 1.80GHz	1MB	0.14369994
i3-3217U @ 1.80GHz	10MB	0.49248859
i3-3217U @ 1.80GHz	100MB	0.535957719
i3-3217U @ 1.80GHz	1000MB	3.773678079

Implementation details and notes

Using Rust 1.43.1.
Using the rayon crate for transparent parallelization of the histogram calculation.
Using the clap crate to parse the command line options (only one, which is the input JSON log file).
Using the prettytable-rs crate to produce a pretty command line table with the results.
Using serde_json to read each JSON record to a struct.
Skipped the ability to pipe files to the tool so it can read from stdin. The motivation was that rayon does not provide its .par_bridge function to polymorphic Box<dyn BufRead> objects (which is the common denominator of std::io::stdin().lock() and std::fs::File.open(path)). I could have probably made it work but after 2 hours of attempts I realized that it might take a long time so I cut it short.
Used the .lines() function on the BufReader even though that allocates a new String per line. I am aware of the better BufReader.read_line idiom with a single String buffer (which is cleared after every line is consumed) and my initial non-parallel version even used it -- see this commit. But I couldn't find a quick way to translate this idiom to simply having something with the .lines() function (rayon expects an Iterator). I could have implemented Iterator for a wrapping struct or enum but, same as above, I was not sure if it will not take me very long. IMO even with that caveat the tool is very fast (see performance results table below).
The commit history got slightly botched because I had to use bfg to remove the 1MB / 10MB / 100MB / 1000MB JSON files that I added earlier (which I replaced with gzipped variants later).

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
src		src
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

How to compile and use

Benchmarks

Implementation details and notes

About

Releases

Packages

Languages

dimitarvp/json-log-histogram-rust

Folders and files

Latest commit

History

Repository files navigation

Introduction

How to compile and use

Benchmarks

Implementation details and notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages