Common "footguns" when writing CLI applications #29

kbknapp · 2018-03-27T00:30:36Z

It seems like we're coming across a few small "footguns", some common some less so, so it may be nice to collect them in one place so when writing something like #6 we don't forget anything.

This could happen in #6, but I'd prefer if it were separate so this can dive a little deeper and actually make a list.

Off the top of my head from stuff that's come up today in the gitter and elsewhere:

Reading stdin line by line is slow, because it does a heap alloc at each line. Easiest solution is BufReader::read_lines
Using String as an input type for arbitrary data that doesn't necessarily need to be UTF-8
Using println! when writing lots of small output, which locks stdout at each call. Prefer locking stdout manually and using something like write!
I think Silently dropped errors in std #28 probably qualifies too, although it's probably far less common. The workout is simple, but not really intuitive IMO (calling libc::close manually, and checking for errors)
println! can panic on a broken pipe
std::env::args panics at invalid UTF-8 (the fix is to use std::env::args_os but this is not very intuitive, and with issues like Ergonomics of String handling #26 it's far more friction than it should be)

If you've got more I'll add them. Of course I think we should also be open to discussion about the ones I listed as to why they should, or should not be listed and/or other fixes for them. 😄

Gitter log of reading `stdin` line by line


@rcoh 18:37: Hello! I'm working on https://github.com/rcoh/angle-grinder
in the process, I noticed that reading from stdin via .lines() is pretty slow
there are ways around it, of course. ripgrep uses a custom written stdin buffer
@BurntSushi 18:38: yeah it creates an alloc for each new line. ripgrep's buffer is quite a bit more involved. :grinning: easiest path to something fast is BufReader::read_line
@rcoh 18:38: In any case, I was wondering if there were any plans to make reading lots of data from stdin a more generally supported pattern or if ripgrep's pattern could be abstracted into a library in some way
@BurntSushi 18:39: ripgrep doesn't really read line-by-line, so it's a bit different
@rcoh 18:40: but I assume at some point in the internals, it's splitting things into lines?
@BurntSushi 18:40: nope
@rcoh 18:40: oh interesting!
@BurntSushi 18:40: only for output
@rcoh 18:40: doesn't grep have line-by-line semantics? Does ripgrep as well?
@BurntSushi 18:41: in the (very common) case that searching a file yields no matches, it is possible that the concept of a "line" never actually materializes at all
@rcoh 18:41: interesting.
@BurntSushi 18:41: see: https://blog.burntsushi.net/ripgrep/#mechanics (GNU grep does the same thing)
@rcoh 18:42: My current approach is:
        let mut line = String::with_capacity(1024);
        while buf.read_line(&mut line).unwrap() > 0 {
            self.proc_str(&(line));
            line.clear();
        }

Is that fastest I can do without resorting to a totally different strategy?
@BurntSushi 18:42: pretty much, yes. if you could use a Vec instead of a String, then you can use read_until and avoid the UTF-8 validation check. depends on what you're doing though!

if you're processing data from arbitrary files, then String is very likely the wrong thing to use
@rcoh 18:44: it's intended to process log files so it's almost always printable data except in some weird cases
@BurntSushi 18:44: it all depends on whether you're OK with it choking on non-UTF-8 encoded text
@rcoh 18:53: Got it. Thanks for your help! I literally couldn't think of a more qualified person to have answer my question :grinning:
@BurntSushi 18:53: :grinning: FWIW, I do intend on splitting the core ripgrep search code out into the grep crate, which will include the buffer handling. it's likely possible that the API will be flexible enough to do line-by-line parsing as fast as possible (although, avoiding line-by-line will still of course be faster for many workloads)

@rcoh 19:02: Cool. I don't actually need to be line-by-line until after running some filtering so that would be useful for my use case.

The text was updated successfully, but these errors were encountered:

spacekookie · 2018-03-27T07:27:21Z

Using String as an input type for arbitrary data that doesn't necessarily need to be UTF-8

I'm not sure in the year 2018 anyone should not be expecting utf-8 input 🤔

BurntSushi · 2018-03-27T11:12:13Z

@spacekookie If your tool is designed to process arbitrary files, and that processing can be done in a somewhat encoding agnostic way (for example, one might only need the assumption that the text is ASCII compatible), then you really can't use String. Many of the tools I write fall into this category, so I am personally biased in witnessing the representation of these cases, but I wouldn't be surprised if this were a common thing when writing tools intended for use by others on Unix-like systems.

On Windows this gets a bit more complicated. If you write a CLI tool for Windows that reads files, then at some point, you're likely to come across UTF-16. This can be handled automatically via BOM-sniffed and transcoding, but there's no out-of-the-box convenient abstraction to handle this. (There are nominal String::from_utf16 methods, which work well if you're OK reading an entire file into memory. For some tools, that's OK.)

epage · 2018-03-27T13:01:41Z

This can be handled automatically via BOM-sniffed and transcoding, but there's no out-of-the-box convenient abstraction to handle this.

Sounds like a good candidate for any interested party in the WG to take on.

BurntSushi · 2018-03-27T13:04:20Z

Sounds like a good candidate for any interested party in the WG to take on.

For streaming transcoding, see:

implementation for io::Read/io::Write hsivonen/encoding_rs#8
https://github.com/BurntSushi/ripgrep/blob/b6177f0459044a7e3fb882ecda9c80e44e4d95de/src/decoder.rs (Note that this is not a fully general solution, see the note.)

jeekobu · 2018-05-17T05:13:49Z

The String issue is, in my view, huge. If your CLI app takes a filename but can't handle my non-UTF8-but-perfectly-valid argument, for example, that's a bug. And in general, non-String-but-text-data ([u8], OsString) feels very second-class-citizen in Rust. Most libraries/APIs/macros require String... such as std::fmt (including Debug and Display). And putting unchecked data into a String is considered UB.

Ideally, an overarching trait or similar could be created to expand existing systems to the non-UTF8 story. I'm sure that's not a 2018 possibility though, and would be interested to see what the WG suggestions in the meantime.

(Side note, since the OP mentioned the "ideal guide" issue: the quicli docs are mentioned as a "concrete example" there. Their front page example has file: String.)

BurntSushi · 2018-05-17T10:26:13Z

@jeekobu An RFC was recently merged that adds Pattern support to OsStr, which should make it a bit nicer to use. IMO, it should also be done for &[u8], but the RFC only addresses OsStr.

I think fixing the "programs demand UTF-8, even for file paths" bug is the realm of argv parsers. They should make it very easy to handle file paths correctly.

jeekobu · 2018-05-18T22:16:15Z

The Pattern extension is why I have some hope that quality &[u8] and/or OsStr support is a future possibility ☺️. But it's unstable (and thus currently easy to extend or modify), unlike most of std::fmt (which is where I think most of the pain comes from).

Anyway, after I made that comment, I found #26 which more directly addresses this hurdle.

djhworld · 2018-12-30T11:35:25Z

Unfortunately managed to shoot myself in the foot regarding reading lines from stdin, see here

https://www.reddit.com/r/rust/comments/aaood3/go_version_of_program_is_40_quicker_than_rust/

and follow up gist here https://gist.github.com/djhworld/ced7eabca8e0c06d5b65917e87fb8745

I think it would be great if the CLI book could at least document a recommended way of reading newline deilimited input from stdin to avoid falling into these traps, or at least have an example application that does something trivial like converting input to uppercase or something.

killercup · 2018-12-30T12:33:30Z

@djhworld very good point! I have a TODO open for stdin in #95.

I'm not entirely sure what the book should recommend precisely; I'd say using a buffered reader is uncontroversial. Reusing the line buffer however will require you to be very careful in that you always clear it and don't expect to be able to store it permanently without cloning.

Do you think that reading from stdin, using a buffered reader, will often be a performance issue? Or would other parts of the program be usually vastly slower?

yoshuawuyts · 2018-12-31T12:17:44Z

I'm not entirely sure what the book should recommend precisely; I'd say using a buffered reader is uncontroversial. Reusing the line buffer however will require you to be very careful in that you always clear it and don't expect to be able to store it permanently without cloning.

I think putting this in the book, in almost these exact words would be super helpful. Coupled with an example it would probably help people a lot.

Do you think that reading from stdin, using a buffered reader, will often be a performance issue?

I've seen the issue of folks unknowingly acquiring a lock on stdin over and over come up several times. It seems that at least for trivial benchmarks / exploratory work this is an actual issue. I suspect this might also be an actual bottleneck for people building filtering shell tools, similar to grep or jq.

stappersg · 2019-02-16T07:22:28Z

and would be interested to see what the WG suggestions in the meantime.

WG is the acronym for ... ??

Dylan-DPC-zz · 2019-02-16T07:28:50Z

Working Group

yuvadm · 2019-11-23T08:20:20Z

Is there any recommended pattern or best-practice for writing a piping CLI that efficiently reads from and writes to stdin/stdout?

Even having a reference implementation to look at would be useful.

killercup · 2019-11-23T12:39:19Z

@yuvadm I haven't seen a write-up about it, but I'd love to see one! :)

Maybe have a look at what some projects use? Reading and writing to std{in,out} should be easy enough, but it gets tricky if you want to handle "broken pipe" errors gracefully. This was also discussed in #21, btw.

kbknapp added the tracking issue label Mar 27, 2018

killercup mentioned this issue Jul 24, 2018

Rust CLI book #45

Closed

BurntSushi mentioned this issue May 9, 2019

io::Stdout should use block bufferring when appropriate rust-lang/rust#60673

Open

bbros-dev mentioned this issue Jul 12, 2021

add an example with clapv3 (beta) with optional stdin/stdout rust-cli/book#37

Open

killercup mentioned this issue Jul 19, 2021

book: performance chapter rust-cli/book#52

Open

epage added A-book Area: Documenting how to create CLIs C-tracking-issue Category: A tracking issue for an unstable feature labels Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common "footguns" when writing CLI applications #29

Common "footguns" when writing CLI applications #29

kbknapp commented Mar 27, 2018 •

edited

Loading

spacekookie commented Mar 27, 2018

BurntSushi commented Mar 27, 2018

epage commented Mar 27, 2018

BurntSushi commented Mar 27, 2018

jeekobu commented May 17, 2018

BurntSushi commented May 17, 2018

jeekobu commented May 18, 2018

djhworld commented Dec 30, 2018

killercup commented Dec 30, 2018

yoshuawuyts commented Dec 31, 2018

stappersg commented Feb 16, 2019

Dylan-DPC-zz commented Feb 16, 2019

yuvadm commented Nov 23, 2019 •

edited

Loading

killercup commented Nov 23, 2019

Common "footguns" when writing CLI applications #29

Common "footguns" when writing CLI applications #29

Comments

kbknapp commented Mar 27, 2018 • edited Loading

spacekookie commented Mar 27, 2018

BurntSushi commented Mar 27, 2018

epage commented Mar 27, 2018

BurntSushi commented Mar 27, 2018

jeekobu commented May 17, 2018

BurntSushi commented May 17, 2018

jeekobu commented May 18, 2018

djhworld commented Dec 30, 2018

killercup commented Dec 30, 2018

yoshuawuyts commented Dec 31, 2018

stappersg commented Feb 16, 2019

Dylan-DPC-zz commented Feb 16, 2019

yuvadm commented Nov 23, 2019 • edited Loading

killercup commented Nov 23, 2019

kbknapp commented Mar 27, 2018 •

edited

Loading

yuvadm commented Nov 23, 2019 •

edited

Loading