Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize the map type used for structured data #19

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Roguelazer
Copy link
Owner

This still defaults to BTreeMap but allows you to use a HashMap (or, if you pass the right feature, an IndexMap) instead.

Major breaking API changes (e.g., removal of the deref to BTreeMap, which was kind of dumb anyway).

I added a little command-line program to test different implementations and BTreeMap still consistently outperforms HashMap and IndexMap (even with fxhash) for my test data, at least on the ARM64 MacBook Pro I'm currently typing on. Example output:

$ time ./target/release/examples/validate_file -m btreemap messages
count ok: 40000

________________________________________________________
Executed in  427.08 millis    fish           external
   usr time  412.84 millis   66.00 micros  412.78 millis
   sys time   12.46 millis  1073.00 micros   11.38 millis

$ time ./target/release/examples/validate_file -m indexmap+fxhash messages
count ok: 40000

________________________________________________________
Executed in  442.98 millis    fish           external
   usr time  428.76 millis   52.00 micros  428.71 millis
   sys time   12.76 millis  895.00 micros   11.87 millis

This is intended to address #18

@Roguelazer Roguelazer changed the title Map the map type used for structured data type parameter Parameterize the map type used for structured data type parameter Jan 18, 2021
@Roguelazer Roguelazer changed the title Parameterize the map type used for structured data type parameter Parameterize the map type used for structured data Jan 18, 2021
@Roguelazer
Copy link
Owner Author

Here's what hyperfine has to say for synthetic data with lots of SD fields:

hyperfine -L maptype btreemap,hashmap,hashmap+fxhash,indexmap,indexmap+fxhash './target/release/examples/validate_file -m {maptype} messages'
Benchmark #1: ./target/release/examples/validate_file -m btreemap messages
  Time (mean ± σ):     391.2 ms ±   8.9 ms    [User: 380.3 ms, System: 9.0 ms]
  Range (min … max):   385.6 ms … 415.6 ms    10 runs

Benchmark #2: ./target/release/examples/validate_file -m hashmap messages
  Time (mean ± σ):     401.7 ms ±   1.8 ms    [User: 391.8 ms, System: 8.5 ms]
  Range (min … max):   399.0 ms … 403.6 ms    10 runs

Benchmark #3: ./target/release/examples/validate_file -m hashmap+fxhash messages
  Time (mean ± σ):     384.2 ms ±   2.5 ms    [User: 374.5 ms, System: 8.4 ms]
  Range (min … max):   380.6 ms … 387.8 ms    10 runs

Benchmark #4: ./target/release/examples/validate_file -m indexmap messages
  Time (mean ± σ):     413.0 ms ±   2.7 ms    [User: 403.2 ms, System: 8.5 ms]
  Range (min … max):   408.3 ms … 416.4 ms    10 runs

Benchmark #5: ./target/release/examples/validate_file -m indexmap+fxhash messages
  Time (mean ± σ):     403.4 ms ±   2.6 ms    [User: 393.7 ms, System: 8.4 ms]
  Range (min … max):   399.5 ms … 406.7 ms    10 runs

Summary
  './target/release/examples/validate_file -m hashmap+fxhash messages' ran
    1.02 ± 0.02 times faster than './target/release/examples/validate_file -m btreemap messages'
    1.05 ± 0.01 times faster than './target/release/examples/validate_file -m hashmap messages'
    1.05 ± 0.01 times faster than './target/release/examples/validate_file -m indexmap+fxhash messages'
    1.07 ± 0.01 times faster than './target/release/examples/validate_file -m indexmap messages'

And here's what it has to say for some production data with 0-3 SD items per line:

Benchmark #1: ./target/release/examples/validate_file -m btreemap very_little_sd.log
  Time (mean ± σ):     132.4 ms ±   4.0 ms    [User: 129.5 ms, System: 1.3 ms]
  Range (min … max):   129.0 ms … 147.0 ms    20 runs

Benchmark #2: ./target/release/examples/validate_file -m hashmap very_little_sd.log
  Time (mean ± σ):     134.1 ms ±   2.5 ms    [User: 131.1 ms, System: 1.5 ms]
  Range (min … max):   130.5 ms … 137.7 ms    21 runs

Benchmark #3: ./target/release/examples/validate_file -m hashmap+fxhash very_little_sd.log
  Time (mean ± σ):     132.1 ms ±   2.6 ms    [User: 129.5 ms, System: 1.3 ms]
  Range (min … max):   128.4 ms … 137.1 ms    22 runs

Benchmark #4: ./target/release/examples/validate_file -m indexmap very_little_sd.log
  Time (mean ± σ):     137.1 ms ±   3.3 ms    [User: 133.9 ms, System: 1.3 ms]
  Range (min … max):   131.6 ms … 144.1 ms    20 runs

Benchmark #5: ./target/release/examples/validate_file -m indexmap+fxhash very_little_sd.log
  Time (mean ± σ):     134.3 ms ±   2.7 ms    [User: 132.0 ms, System: 1.1 ms]
  Range (min … max):   130.7 ms … 139.3 ms    21 runs

Summary
  './target/release/examples/validate_file -m hashmap+fxhash very_little_sd.log' ran
    1.00 ± 0.04 times faster than './target/release/examples/validate_file -m btreemap very_little_sd.log'
    1.02 ± 0.03 times faster than './target/release/examples/validate_file -m hashmap very_little_sd.log'
    1.02 ± 0.03 times faster than './target/release/examples/validate_file -m indexmap+fxhash very_little_sd.log'
    1.04 ± 0.03 times faster than './target/release/examples/validate_file -m indexmap very_little_sd.log'

In both cases (and exclusively on this particular hardware), the fastest is HashMap + FxHash, but only by the thinnest of margins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant