Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove bitflags dependency #1436

Merged
merged 2 commits into from
Mar 27, 2019
Merged

Remove bitflags dependency #1436

merged 2 commits into from
Mar 27, 2019

Conversation

npmccallum
Copy link

The only real use of bitflags was to save a tiny bit of memory. It isn't
exposed externally and we can just use a flag enumeration combined with
a HashSet to replace it. Because of this, let's not subject this
dependency to all projects which depend on clap.

The only real use of bitflags was to save a tiny bit of memory. It isn't
exposed externally and we can just use a flag enumeration combined with
a HashSet to replace it. Because of this, let's not subject this
dependency to all projects which depend on clap.
@spacekookie
Copy link
Contributor

This isn't really something we can merge against master because of the breaking change.

@npmccallum
Copy link
Author

  1. What breaking change? All the type and code changes are internal.
  2. Where do you merge breaking changes besides master?

@spacekookie
Copy link
Contributor

Ah, apologies, your description of the change made it sound this was something re-exported to depending crates. Generally breaking changes are being made on v3-master and v3-dev.

So...I don't really have an opinion on merging this. I will leave it open for a bit longer to let other maintainers voice concerns if they have them, otherwise merge it in a few days.

@npmccallum
Copy link
Author

@spacekookie Should I make separate pull requests against v3-master/v3-dev?

@BurntSushi
Copy link
Contributor

I'm not involved in clap development, but I definitely appreciate reducing my dependency list when something isn't carrying its weight. :-) I've been trying to do the same for my crates.

@spacekookie
Copy link
Contributor

@npmccallum v3-master seems to be the correct place, yea. If the dependency is still present, feel free to open another PR for its removal

@spacekookie spacekookie merged commit 6ba4772 into clap-rs:master Mar 27, 2019
@kbknapp
Copy link
Member

kbknapp commented Apr 4, 2019

This is a commit I'd like to revert. In favor of removing bitflags but including a home grown solution, since we're only using a small fraction of what bitflags provides.

To be clear, I'm all for reducing deps, and bitflags is a prime candidate. However, moving to HashSet has two issues:

  • It incurs a ton of runtime overhead with hashing
  • "settings" cannot be combined efficiently

Depending on the exact use case, this commit can roughly double or triple clap parse time. I'm specifically leery of cases like ripgrep where hundreds or thousands of arguments can end up being supplied through globs. In the simple test benchmark below of a few hundred arguments it goes from ~0.4ms to ~1.2ms. Although @BurntSushi would have to say if those numbers are still low enough to not matter much. One my goals with v3 is to reduce that specific benchmark to ~0.1ms.

Here's benchmarks pre-commit:

     Running target/release/deps/01_default-abb0aa36a0d401cd

running 2 tests
test build_app ... bench: 46 ns/iter (+/- 6)
test parse_clean ... bench: 521 ns/iter (+/- 64)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

 Running target/release/deps/02_simple-03f9855755188389

running 12 tests
test add_flag ... bench: 162 ns/iter (+/- 20)
test add_flag_ref ... bench: 186 ns/iter (+/- 42)
test add_opt ... bench: 223 ns/iter (+/- 30)
test add_opt_ref ... bench: 283 ns/iter (+/- 90)
test add_pos ... bench: 156 ns/iter (+/- 43)
test add_pos_ref ... bench: 199 ns/iter (+/- 27)
test build_app ... bench: 722 ns/iter (+/- 93)
test parse_clean ... bench: 1,469 ns/iter (+/- 223)
test parse_complex ... bench: 3,339 ns/iter (+/- 529)
test parse_flag ... bench: 1,916 ns/iter (+/- 208)
test parse_option ... bench: 2,293 ns/iter (+/- 240)
test parse_positional ... bench: 2,110 ns/iter (+/- 187)

test result: ok. 0 passed; 0 failed; 0 ignored; 12 measured; 0 filtered out

 Running target/release/deps/03_complex-cf1d36594ae13a8b

running 15 tests
test create_app_builder ... bench: 2,681 ns/iter (+/- 254)
test create_app_from_usage ... bench: 4,063 ns/iter (+/- 466)
test create_app_macros ... bench: 2,713 ns/iter (+/- 369)
test parse_clean ... bench: 4,732 ns/iter (+/- 285)
test parse_complex1 ... bench: 11,446 ns/iter (+/- 1,727)
test parse_complex2 ... bench: 11,719 ns/iter (+/- 1,484)
test parse_complex2_with_args_negate_scs ... bench: 11,626 ns/iter (+/- 2,203)
test parse_flag ... bench: 5,953 ns/iter (+/- 873)
test parse_option ... bench: 5,831 ns/iter (+/- 1,036)
test parse_positional ... bench: 5,850 ns/iter (+/- 558)
test parse_sc_clean ... bench: 6,623 ns/iter (+/- 745)
test parse_sc_complex ... bench: 9,081 ns/iter (+/- 1,226)
test parse_sc_flag ... bench: 8,079 ns/iter (+/- 1,033)
test parse_sc_option ... bench: 7,662 ns/iter (+/- 532)
test parse_sc_positional ... bench: 7,692 ns/iter (+/- 525)

test result: ok. 0 passed; 0 failed; 0 ignored; 15 measured; 0 filtered out

 Running target/release/deps/04_new_help-ca7488d443fe5853

running 10 tests
test example1 ... bench: 13,159 ns/iter (+/- 1,534)
test example10 ... bench: 5,450 ns/iter (+/- 827)
test example2 ... bench: 1,889 ns/iter (+/- 128)
test example3 ... bench: 13,530 ns/iter (+/- 1,689)
test example4 ... bench: 7,926 ns/iter (+/- 752)
test example4_template ... bench: 7,406 ns/iter (+/- 957)
test example5 ... bench: 4,886 ns/iter (+/- 675)
test example6 ... bench: 4,047 ns/iter (+/- 522)
test example7 ... bench: 7,062 ns/iter (+/- 840)
test example8 ... bench: 6,876 ns/iter (+/- 671)

test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured; 0 filtered out

 Running target/release/deps/05_ripgrep-b8302848a935955e

running 7 tests
test build_app_long ... bench: 11,724 ns/iter (+/- 1,716)
test build_app_short ... bench: 12,107 ns/iter (+/- 1,558)
test build_help_long ... bench: 205,504 ns/iter (+/- 14,736)
test build_help_short ... bench: 84,388 ns/iter (+/- 9,844)
test parse_clean ... bench: 13,706 ns/iter (+/- 1,637)
test parse_complex ... bench: 23,165 ns/iter (+/- 2,262)
test parse_lots ... bench: 447,323 ns/iter (+/- 53,461)

test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured; 0 filtered out

 Running target/release/deps/06_rustup-fc82f6957c63b55b

running 3 tests
test build_app ... bench: 14,156 ns/iter (+/- 2,108)
test parse_clean ... bench: 15,900 ns/iter (+/- 2,102)
test parse_subcommands ... bench: 15,659 ns/iter (+/- 3,058)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

And post commit:

     Running target/release/deps/01_default-725ae2b37c6dbdf4

running 2 tests
test build_app ... bench: 307 ns/iter (+/- 68)
test parse_clean ... bench: 1,181 ns/iter (+/- 153)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

 Running target/release/deps/02_simple-839949c9e9be0e4d

running 12 tests
test add_flag ... bench: 665 ns/iter (+/- 64)
test add_flag_ref ... bench: 735 ns/iter (+/- 69)
test add_opt ... bench: 749 ns/iter (+/- 107)
test add_opt_ref ... bench: 942 ns/iter (+/- 118)
test add_pos ... bench: 753 ns/iter (+/- 116)
test add_pos_ref ... bench: 826 ns/iter (+/- 113)
test build_app ... bench: 1,603 ns/iter (+/- 137)
test parse_clean ... bench: 3,104 ns/iter (+/- 1,122)
test parse_complex ... bench: 6,409 ns/iter (+/- 640)
test parse_flag ... bench: 4,063 ns/iter (+/- 771)
test parse_option ... bench: 4,243 ns/iter (+/- 551)
test parse_positional ... bench: 4,372 ns/iter (+/- 510)

test result: ok. 0 passed; 0 failed; 0 ignored; 12 measured; 0 filtered out

 Running target/release/deps/03_complex-2a861415386d9071

running 15 tests
test create_app_builder ... bench: 9,001 ns/iter (+/- 1,011)
test create_app_from_usage ... bench: 9,458 ns/iter (+/- 879)
test create_app_macros ... bench: 9,192 ns/iter (+/- 919)
test parse_clean ... bench: 11,504 ns/iter (+/- 1,295)
test parse_complex1 ... bench: 21,803 ns/iter (+/- 2,790)
test parse_complex2 ... bench: 24,081 ns/iter (+/- 2,533)
test parse_complex2_with_args_negate_scs ... bench: 25,348 ns/iter (+/- 4,197)
test parse_flag ... bench: 14,085 ns/iter (+/- 2,347)
test parse_option ... bench: 14,860 ns/iter (+/- 2,602)
test parse_positional ... bench: 14,616 ns/iter (+/- 2,276)
test parse_sc_clean ... bench: 14,712 ns/iter (+/- 1,641)
test parse_sc_complex ... bench: 19,143 ns/iter (+/- 2,990)
test parse_sc_flag ... bench: 17,035 ns/iter (+/- 2,343)
test parse_sc_option ... bench: 17,511 ns/iter (+/- 2,409)
test parse_sc_positional ... bench: 16,459 ns/iter (+/- 2,760)

test result: ok. 0 passed; 0 failed; 0 ignored; 15 measured; 0 filtered out

 Running target/release/deps/04_new_help-8d207a285f8a6b1f

running 10 tests
test example1 ... bench: 15,297 ns/iter (+/- 2,049)
test example10 ... bench: 5,897 ns/iter (+/- 868)
test example2 ... bench: 2,034 ns/iter (+/- 131)
test example3 ... bench: 14,776 ns/iter (+/- 1,494)
test example4 ... bench: 8,752 ns/iter (+/- 1,041)
test example4_template ... bench: 7,649 ns/iter (+/- 1,190)
test example5 ... bench: 4,747 ns/iter (+/- 374)
test example6 ... bench: 4,983 ns/iter (+/- 859)
test example7 ... bench: 8,328 ns/iter (+/- 1,082)
test example8 ... bench: 8,855 ns/iter (+/- 1,080)

test result: ok. 0 passed; 0 failed; 0 ignored; 10 measured; 0 filtered out

 Running target/release/deps/05_ripgrep-da3f6f245ccfd553

running 7 tests
test build_app_long ... bench: 37,648 ns/iter (+/- 7,795)
test build_app_short ... bench: 36,538 ns/iter (+/- 3,514)
test build_help_long ... bench: 220,319 ns/iter (+/- 27,991)
test build_help_short ... bench: 96,602 ns/iter (+/- 13,511)
test parse_clean ... bench: 41,439 ns/iter (+/- 3,656)
test parse_complex ... bench: 53,659 ns/iter (+/- 4,716)
test parse_lots ... bench: 1,186,806 ns/iter (+/- 119,096)

test result: ok. 0 passed; 0 failed; 0 ignored; 7 measured; 0 filtered out

 Running target/release/deps/06_rustup-31b2808b94ca14b8

running 3 tests
test build_app ... bench: 48,345 ns/iter (+/- 3,328)
test parse_clean ... bench: 63,096 ns/iter (+/- 9,434)
test parse_subcommands ... bench: 60,744 ns/iter (+/- 6,656)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

As for combining settings, bitflags (or a home grown bit flag style field) allows a single setting to actually set multiple settings in a single swoop (Setting1 | Setting2 | Setting3), whereas a HashSet requires setting all three independently, incurring the hashing overhead each time.

Also, when I say combining flags, I'm speaking about internally where clap uses a single user setting, to actually indicate to itself multiple "core" settings are used.

I'll wait on the revert to get some other opinions before doing so.

@kbknapp
Copy link
Member

kbknapp commented Apr 4, 2019

I should also mention I'm not sure what the balance is between deps and home grown functionality. Some people prefer as few deps as possible, but then others prefer clap to be as lean as possible. So I tend to prefer opt-out deps if possible. In this specific case, I think we're using a small enough portion of bitflags that the home-grown solution is probably fine so we get the benefit of one less dep (not including any transitive deps). 😄

@BurntSushi
Copy link
Contributor

BurntSushi commented Apr 4, 2019

@kbknapp Thanks for the investigation and attention to performance. :-) Balancing dependencies is tricky.

So basically what I usually do for this is to look at two things: 1) holistic performance using xargs across a big directory and 2) single invocation performance.

For (1), here's an example, on a checkout of the Linux kernel:

$ hyperfine 'find ./ -type f -print0 | xargs -0 rg-clap-2.32.0 --no-config -j1 -q a'
Benchmark #1: find ./ -type f -print0 | xargs -0 rg-clap-2.32.0 --no-config -j1 -q a
  Time (mean ± σ):     271.7 ms ±  19.4 ms    [User: 186.9 ms, System: 214.9 ms]
  Range (min … max):   240.0 ms … 301.4 ms    10 runs

$ hyperfine 'find ./ -type f -print0 | xargs -0 rg-clap-6ba477 --no-config -j1 -q a'
Benchmark #1: find ./ -type f -print0 | xargs -0 rg-clap-6ba477 --no-config -j1 -q a
  Time (mean ± σ):     296.1 ms ±  22.2 ms    [User: 206.8 ms, System: 219.1 ms]
  Range (min … max):   253.4 ms … 319.9 ms    10 runs

(To explain the flags: -j1 makes sure ripgrep only spawns one thread, since that could otherwise screw with our measurements. The -q flag means ripgrep won't print any output and will quit as soon as it finds a match. We search for a, which is pretty much guaranteed to find a match immediately. This way, we benchmark ripgrep's overhead as much as possible. Finally, we use --no-config to make sure we're using the default config. And in particular, prevent ripgrep from parsing argv twice, which it does if there's a config file.)

There's a ton of variance here (which is interesting), but the short story is that this PR definitely seems to be noticeable here. Compare this with grep's performance, which is killing it:

$ hyperfine 'find ./ -type f -print0 | xargs -0 grep -q a'
Benchmark #1: find ./ -type f -print0 | xargs -0 grep -q a
  Time (mean ± σ):      97.9 ms ±   7.7 ms    [User: 87.1 ms, System: 98.1 ms]
  Range (min … max):    84.9 ms … 114.9 ms    25 runs

Looking at the output of time, it looks like grep has a lot less sys time. In any case, grep is kind of ancillary to this specific PR.

The other way I benchmark this is with just a single invocation. I use zsh, which means I can use globs to fill up ripgrep's positional arguments easily:

$ time rg-clap-2.32.0 --no-config -j1 -q a **/*.[ch]

real    0.428
user    0.226
sys     0.200
maxmem  34 MB
faults  0

$ time rg-clap-6ba477 --no-config -j1 -q a **/*.[ch]

real    0.477
user    0.263
sys     0.213
maxmem  34 MB
faults  0

These commands unfortunately have quite a bit of variance (as expected given the hyperfine output above). These are harder to benchmark with hyperfine since they rely on glob expansion. But we can establish a baseline with grep:

$ time grep -q a **/*.[ch]

real    0.370
user    0.185
sys     0.182
maxmem  24 MB
faults  0

This suggests glob expansion might actually be taking a fair bit of time. OK, so let's cache glob expansion in a file:

$ echo **/*.[ch] > /tmp/args

And now we can use hyperfine and see the difference a bit more clearly:

$ hyperfine 'rg-clap-2.32.0 --no-config -j1 -q a $(cat /tmp/args)'
Benchmark #1: rg-clap-2.32.0 --no-config -j1 -q a $(cat /tmp/args)
  Time (mean ± σ):     157.6 ms ±  13.1 ms    [User: 87.1 ms, System: 72.3 ms]
  Range (min … max):   142.4 ms … 175.4 ms    17 runs

$ hyperfine 'rg-clap-6ba477 --no-config -j1 -q a $(cat /tmp/args)'
Benchmark #1: rg-clap-6ba477 --no-config -j1 -q a $(cat /tmp/args)
  Time (mean ± σ):     178.0 ms ±  14.4 ms    [User: 106.3 ms, System: 73.5 ms]
  Range (min … max):   156.1 ms … 195.0 ms    15 runs

$ hyperfine 'grep -q a $(cat /tmp/args)'
Benchmark #1: grep -q a $(cat /tmp/args)
  Time (mean ± σ):      65.5 ms ±   3.0 ms    [User: 48.4 ms, System: 19.0 ms]
  Range (min … max):    59.9 ms …  74.3 ms    43 runs

Finally, if I take perf to rg-clap-2.32.0 and rg-clap-6ba477, I can definitely see the effects of using a HashSet here. It isn't that much, but it's there.

My feeling is that this isn't that big of a performance regression. Clap is still fast. But it's definitely measurable and maybe dropping the dependency isn't worth it. But if it's easy to do without bitflags without sacrificing performance, then maybe that's a good path to take.

@kbknapp
Copy link
Member

kbknapp commented Apr 4, 2019

Wow, your thoroughness always amazes me!

Although I tend to try and stay away from slippery slope mentalities, performance is one where I do worry about the sum of regressions since they can't be considered purely in a vacuum. Allowable performance regressions, while vague because they have to be weighed against any tangible gain, are close to a zero sum game in my optic.

Luckily, I think all we really need out of a home grown solution is pretty minimal and easy to implement. So we may well be able to get the best of both worlds.

@npmccallum
Copy link
Author

All, just to be clear, I'm not offended if you need to revert my previous patch for performance reasons. I'm glad there is still a desire to remove a bitflags dependency.

kbknapp added a commit that referenced this pull request Apr 5, 2019
@kbknapp kbknapp mentioned this pull request Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants