Skip to content

Commit

Permalink
Minor enhancement to put_short() macro. This change saw marginal speedup
Browse files Browse the repository at this point in the history
(about 0% to 3% depending on the compression level and input). I guess
the speedup likely arises from following facts:

 1) "s->pending" now is loaded once, and stored once. In the original
    implementation, it needs to be loaded and stored twice as the
    compiler isn't able to disambiguate "s->pending" and
    "s->pending_buf[]"

 2) better code generations:
   2.1) no instruction are needed for extracting two bytes from a short.
   2.2) need less registers
   2.3) stores to adjacent bytes are merged into a single store, albeit
        at the cost of penalty of potentially unaligned access.

Conflicts:
	trees.c
  • Loading branch information
Shuxin Yang authored and Dead2 committed Oct 8, 2014
1 parent 80e06ec commit 666581b
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions deflate.h
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,30 @@ typedef enum {
*/
#define put_byte(s, c) {s->pending_buf[s->pending++] = (c);}

/* ===========================================================================
* Output a short LSB first on the stream.
* IN assertion: there is enough room in pendingBuf.
*/
#if defined(__x86_64) || defined(__i386_)
/* Compared to the else-clause's implementation, there are few advantages:
* - s->pending is loaded only once (else-clause's implementation needs to
* load s->pending twice due to the alias between s->pending and
* s->pending_buf[].
* - no instructions for extracting bytes from short.
* - needs less registers
* - stores to adjacent bytes are merged into a single store, albeit at the
* cost of penalty of potentially unaligned access.
*/
#define put_short(s, w) { \
s->pending += 2; \
*(ush*)(&s->pending_buf[s->pending - 2]) = (w) ; \
}
#else
#define put_short(s, w) { \
put_byte(s, (uch)((w) & 0xff)); \
put_byte(s, (uch)((ush)(w) >> 8)); \
}
#endif

#define MIN_LOOKAHEAD (MAX_MATCH+MIN_MATCH+1)
/* Minimum amount of lookahead, except at the end of the input file.
Expand Down

0 comments on commit 666581b

Please sign in to comment.