Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Commit

Permalink
pipe: avoid unnecessary EPOLLET wakeups under normal loads
Browse files Browse the repository at this point in the history
I had forgotten just how sensitive hackbench is to extra pipe wakeups,
and commit 3a34b13 ("pipe: make pipe writes always wake up
readers") ended up causing a quite noticeable regression on larger
machines.

Now, hackbench isn't necessarily a hugely meaningful benchmark, and it's
not clear that this matters in real life all that much, but as Mel
points out, it's used often enough when comparing kernels and so the
performance regression shows up like a sore thumb.

It's easy enough to fix at least for the common cases where pipes are
used purely for data transfer, and you never have any exciting poll
usage at all.  So set a special 'poll_usage' flag when there is polling
activity, and make the ugly "EPOLLET has crazy legacy expectations"
semantics explicit to only that case.

I would love to limit it to just the broken EPOLLET case, but the pipe
code can't see the difference between epoll and regular select/poll, so
any non-read/write waiting will trigger the extra wakeup behavior.  That
is sufficient for at least the hackbench case.

Apart from making the odd extra wakeup cases more explicitly about
EPOLLET, this also makes the extra wakeup be at the _end_ of the pipe
write, not at the first write chunk.  That is actually much saner
semantics (as much as you can call any of the legacy edge-triggered
expectations for EPOLLET "sane") since it means that you know the wakeup
will happen once the write is done, rather than possibly in the middle
of one.

[ For stable people: I'm putting a "Fixes" tag on this, but I leave it
  up to you to decide whether you actually want to backport it or not.
  It likely has no impact outside of synthetic benchmarks  - Linus ]

Link: https://lore.kernel.org/lkml/20210802024945.GA8372@xsang-OptiPlex-9020/
Fixes: 3a34b13 ("pipe: make pipe writes always wake up readers")
Reported-by: kernel test robot <[email protected]>
Tested-by: Sandeep Patil <[email protected]>
Tested-by: Mel Gorman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
torvalds committed Aug 18, 2021
1 parent 614cb27 commit 3b84482
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
15 changes: 9 additions & 6 deletions fs/pipe.c
Original file line number Diff line number Diff line change
Expand Up @@ -444,9 +444,6 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
#endif

/*
* Epoll nonsensically wants a wakeup whether the pipe
* was already empty or not.
*
* If it wasn't empty we try to merge new data into
* the last buffer.
*
Expand All @@ -455,9 +452,9 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* spanning multiple pages.
*/
head = pipe->head;
was_empty = true;
was_empty = pipe_empty(head, pipe->tail);
chars = total_len & (PAGE_SIZE-1);
if (chars && !pipe_empty(head, pipe->tail)) {
if (chars && !was_empty) {
unsigned int mask = pipe->ring_size - 1;
struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
int offset = buf->offset + buf->len;
Expand Down Expand Up @@ -590,8 +587,11 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* This is particularly important for small writes, because of
* how (for example) the GNU make jobserver uses small writes to
* wake up pending jobs
*
* Epoll nonsensically wants a wakeup whether the pipe
* was already empty or not.
*/
if (was_empty) {
if (was_empty || pipe->poll_usage) {
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
}
Expand Down Expand Up @@ -654,6 +654,9 @@ pipe_poll(struct file *filp, poll_table *wait)
struct pipe_inode_info *pipe = filp->private_data;
unsigned int head, tail;

/* Epoll has some historical nasty semantics, this enables them */
pipe->poll_usage = 1;

/*
* Reading pipe state only -- no need for acquiring the semaphore.
*
Expand Down
2 changes: 2 additions & 0 deletions include/linux/pipe_fs_i.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ struct pipe_buffer {
* @files: number of struct file referring this pipe (protected by ->i_lock)
* @r_counter: reader counter
* @w_counter: writer counter
* @poll_usage: is this pipe used for epoll, which has crazy wakeups?
* @fasync_readers: reader side fasync
* @fasync_writers: writer side fasync
* @bufs: the circular array of pipe buffers
Expand All @@ -70,6 +71,7 @@ struct pipe_inode_info {
unsigned int files;
unsigned int r_counter;
unsigned int w_counter;
unsigned int poll_usage;
struct page *tmp_page;
struct fasync_struct *fasync_readers;
struct fasync_struct *fasync_writers;
Expand Down

0 comments on commit 3b84482

Please sign in to comment.