Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation of fmt #272

Merged
merged 3 commits into from
Jun 19, 2014
Merged

implementation of fmt #272

merged 3 commits into from
Jun 19, 2014

Conversation

kwantam
Copy link
Contributor

@kwantam kwantam commented Jun 19, 2014

I've got an initial implementation of fmt completed. The options are sort of a combination of GNU and BSD, with a couple extra minor ones that I added because it seemed sensible and relatively straightforward.

Note: for now, this version does not use Knuth-Plass, but everything else is in place using "greedy" breaking. I've got a working Knuth-Plass implementation in Haskell that I'm going to port over.

All options (should) work, and performance is nearly on par with GNU fmt.

(GNU fmt is the first run, this version is the second.)

$ ls -lah playground/long
-rw-r--r-- 1 kwantam kwantam 42M Jun 18 20:51 playground/long
$ perf stat -r 20 fmt playground/long >/dev/null

 Performance counter stats for fmt playground/long' (20 runs):

       1077.217547      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.34% )
                12      context-switches          #    0.011 K/sec                    ( +- 12.48% )
                 3      cpu-migrations            #    0.003 K/sec                    ( +- 27.58% )
               184      page-faults               #    0.171 K/sec                    ( +-  0.04% )
     3,485,343,206      cycles                    #    3.236 GHz                      ( +-  0.06% )
       833,519,491      stalled-cycles-frontend   #   23.91% frontend cycles idle     ( +-  0.20% )
   <not supported>      stalled-cycles-backend   
     7,521,678,405      instructions              #    2.16  insns per cycle        
                                                  #    0.11  stalled cycles per insn  ( +-  0.00% )
     1,439,929,997      branches                  # 1336.712 M/sec                    ( +-  0.00% )
        40,051,616      branch-misses             #    2.78% of all branches          ( +-  0.01% )

       1.078562530 seconds time elapsed                                          ( +-  0.34% )

$ perf stat -r 20 ./fmt playground/long >/dev/null

 Performance counter stats for './fmt playground/long' (20 runs):

       1267.654011      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.23% )
                10      context-switches          #    0.008 K/sec                    ( +-  7.02% )
                 2      cpu-migrations            #    0.002 K/sec                    ( +- 43.83% )
           101,079      page-faults               #    0.080 M/sec                    ( +-  0.00% )
     4,125,868,082      cycles                    #    3.255 GHz                      ( +-  0.03% )
       896,450,011      stalled-cycles-frontend   #   21.73% frontend cycles idle     ( +-  0.05% )
   <not supported>      stalled-cycles-backend   
     9,669,115,061      instructions              #    2.34  insns per cycle        
                                                  #    0.09  stalled cycles per insn  ( +-  0.00% )
     2,302,891,543      branches                  # 1816.656 M/sec                    ( +-  0.00% )
        19,958,464      branch-misses             #    0.87% of all branches          ( +-  0.00% )

       1.268852889 seconds time elapsed                                          ( +-  0.23% )

kwantam added 2 commits June 18, 2014 20:43
Note: for now, this version does not use Knuth-Plass,
but everything else is in place with "greedy" breaking.

All options (should) work, and performance is nearly
on par with GNU fmt.

Squashed commit of the following local commits:

commit ebc12f5
commit 125fdab
commit dadd62a
commit e436fda
commit bbc4f4f
commit 12bc4ec
commit 2e69355
commit 9b15a13
commit ea335eb
Merge: ee92573 23cc41d
commit 23cc41d
commit 2fa7c48
commit eb71558
commit c8baabc
commit ee4fab4
commit c544441
commit e1177d4
commit c7fb30e
commit 99a9406
commit 3d244d6
commit 2d4f09c
commit 947c32b
commit 8556d2a
commit a2e4bc3
Merge: 0308884 439e65d
commit 0308884
commit ac80d88
commit c1d6b36
commit 6539b10
commit 439e65d
commit fac27de
commit 365989c
commit 3dd7136
($exp:expr) => (
match $exp {
Ok(_) => (),
Err(_) => unsafe { ::libc::exit(1) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be fail!() instead of ::libc::exit(1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this was to exit silently if writing to stdout fails. This will happen, e.g., if we get a broken pipe. GNU utils for the most part do not report "Broken pipe" in this case.

If you'd prefer, I can just use crash! or the like here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean. I suppose this is fine.

@Arcterus
Copy link
Collaborator

Basically all of my problems are with spacing right now. I think the rest of the code is good. Does this support all of the GNU options (you said "sort of a combination" above).

@kwantam
Copy link
Contributor Author

kwantam commented Jun 19, 2014

Got it. I'm pulling in all your comments now, just have to check that I haven't broken anything!

Thanks for the review. Thorough and helpful as I get my Rust-style-legs under me :)

It supports all of the GNU options with almost identical semantics.

The only place where I explicitly broke with GNU semantics is in tagged-paragraph mode. Specifically, if in this mode AND we have a single long line by itself, we have to determine what the indent is. In GNU fmt, it seems to choose the smallest column number divisible by three. To me that is just weird. So I said that we will indent 4 extra spaces in this case. (Note that if we're in tagged paragraph mode and a paragraph has multiple lines in the source file, then the following lines determine this indent level for us, so this is a pretty narrow case.)

The additional options that I added are:

  • -m: a BSD-style option to try to detect and preserve RFC822 headers.
  • -P PSKIP: lines beginning with PSKIP are not processed
  • -x and -X: for -p and -P, respectively, match PREFIX/PSKIP exactly rather than allowing arbitrary indentation level for the prefix. Default behavior of GNU version is to allow arbitrary indenting.
  • -T TABWIDTH: we don't expand tabs in the output, but we want to know how big they are.
  • -g GOAL: newer versions of GNU fmt have this option. Knuth-Plass wants to have a min, max, and goal. This lets us set the GOAL. WIDTH is max, and min is equidistant from GOAL in the other direction. (Note that this isn't implemented yet...)

I'll add another commit to address your comments. I'm not 100% sure on the struct spacing whether you want colon alignment eliminated entirely. I've reduced the spacing to the minimum necessary to line things up.

// handle "init" portion
silent_unwrap!(ostream.write(para.init_str.as_bytes()));
para.init_len
} else if ! para.mail_header {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! issue from before.


// iterator that produces a stream of Lines from a file
struct FileLines<'a> {
opts : &'a FmtOptions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing.

@Arcterus
Copy link
Collaborator

Your welcome. :)

The indentation thing that you did seems reasonable, so I suppose it's okay that it's not 100% compatible with GNU. Also, the arguments you added seem nice.

You handled the struct formatting thing in a nice way (sorry that I wasn't very clear before).

Some(t) => t,
None => { crash!(1, "Invalid GOAL specification: `{}'", s); }
};
if ! matches.opt_present("w") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I missed this one too. Sorry. :(

@kwantam
Copy link
Contributor Author

kwantam commented Jun 19, 2014

I believe all your comments are addressed now.

I can squash those last few commits if you don't want me to muck up the history.

@Arcterus
Copy link
Collaborator

Yeah, please squash the last few.

@kwantam
Copy link
Contributor Author

kwantam commented Jun 19, 2014

Done, and thanks again for the comments :)

I should have the Knuth-Plass implementation done in the next couple days, which will make the line breaking somewhat prettier but with a (hopefully only slight) speed impact.

@Arcterus
Copy link
Collaborator

No problem. :)

That sounds good. Looking forward to seeing it.

Arcterus added a commit that referenced this pull request Jun 19, 2014
@Arcterus Arcterus merged commit 99b0a11 into uutils:master Jun 19, 2014
jbcrail pushed a commit to jbcrail/coreutils that referenced this pull request Apr 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants