Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI Indexer: silently ignore brokenpipe signal #136

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sebastian-nagel
Copy link

This PR makes the cli indexer silently ignore the broken pipe signal (SIGPIPE) if the output is sent to a Unix pipe and not entirely consumed, e.g.

$> warcio index not-too-small.warc.gz | head
...
Traceback (most recent call last):
  File ".../bin/warcio", line 33, in <module>
    sys.exit(load_entry_point('warcio==1.7.4', 'console_scripts', 'warcio')())
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/cli.py", line 55, in main
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/cli.py", line 68, in indexer
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/indexer.py", line 33, in process_all
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/indexer.py", line 41, in process_one
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/indexer.py", line 53, in process_index_entry
  File ".../lib/python3.9/site-packages/warcio-1.7.4-py3.9.egg/warcio/indexer.py", line 87, in _write_line
BrokenPipeError: [Errno 32] Broken pipe

# alternatively
$> warcio index -o >(head) not-too-small.warc.gz

@wumpus
Copy link
Collaborator

wumpus commented Oct 21, 2021

Sebastian, most Unix things do print the EPIPE, and it's sometimes very helpful when the indexing process has a separate error logfile from the other end of the pipe.

What situation do you have in mind for making it a silent failure?

@sebastian-nagel
Copy link
Author

What situation do you have in mind for making it a silent failure?

Using the indexer output as input for head or grep.

Of course, the indexer could show a short message about the "end of pipe" signal. But a stack trace isn't really useful, a broken pipe signal is hardly an issue of the indexer.

most Unix things do print the EPIPE

Good point. Yes, should exit with EPIPE. I'll update the PR. Thanks!

Usually (unless configured otherwise, eg. set -o pipefail for the bash), the shell ignores this error code for commands in the pipeline except the last one. But I'm not aware of any Unix tool (cat, sort, grep, etc.) which "prints" a message that it received a SIGPIPE. In any case, a stack trace is hardly useful as the reason for the SIGPIPE is outside warcio.

@wumpus
Copy link
Collaborator

wumpus commented Jan 10, 2022

I think I'm remembering EPIPE being useful for things like scp and other network tools. But you're right, normal on-node tools like head have always been silent about EPIPE.

This is kind of similar to the situation of ^C being a silent situation for everything but Python.

@sebastian-nagel
Copy link
Author

Yes, Python is special here:

But also Unix tools handling report the SIGPIPE in their exit value as 128 + 13 = 141 (SIGPIPE is defined as 13 in /usr/include/asm-generic/signal.h):

bash -c '/bin/seq 50000 | head >/dev/null; echo ${PIPESTATUS[@]} "=>" $?'
141 0 => 0

If the shell is configured with pipefail, the entire pipeline fails:

bash -c 'set -o pipefail; /bin/seq 50000 | head >/dev/null; echo ${PIPESTATUS[@]} "=>" $?'
141 0 => 141

But I'm still not sure whether the exit value 141 is appropriate for a Python program - that was the last point I thought about. Btw., the post effective handling of the SIGPIPE signal provides a good summary of the problem.

@sebastian-nagel
Copy link
Author

Note: I'm also fine with a short error message BrokenPipeError: [Errno 32] Broken pipe but without the long stacktrace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants