Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition on 'select' UNIX call #3559

Closed
ichorid opened this issue Apr 4, 2018 · 5 comments
Closed

Race condition on 'select' UNIX call #3559

ichorid opened this issue Apr 4, 2018 · 5 comments
Assignees

Comments

@ichorid
Copy link
Contributor

ichorid commented Apr 4, 2018

Tribler version: 7.1 Next
OS: Ubuntu 18.04

Got this error while running a test of tunnel-community-based download.

Exception in thread StandaloneEndpoint:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/mnt/leecher/tribler/Tribler/dispersy/endpoint.py", line 187, in _loop
    read_list, write_list, _ = select(socket_list, [], [], 0.1)
error: (4, 'Interrupted system call')

The problem appears only when exit code from

os.system('sudo tc qdisc add dev ' + networkDevice + ' root netem delay ' + str(latency) + 'ms')

is non-zero. It doesn't matter what the command is, if it returns with an error, it triggers the problem. If the exit code of the command is insulated by e.g. nohup, like:

os.system('sudo nohup tc qdisc add dev ' + networkDevice + ' root netem delay ' + str(latency) + 'ms')

the problem goes away.
If subprocess.Popen() is used instead of os.system(), the problem goes away too:

Popen("mycmd" + " myarg", shell=True).wait()

If sleep(5) is added after os.system() call, the problem goes away.
So, it is a definite race condition.

As a result, the tunnel community does not start properly, and anonymous downloading does not work.

@ichorid
Copy link
Contributor Author

ichorid commented Apr 4, 2018

Ok, the problem is a race condition triggered by a signal (I guess it is SIGCHLD) generated by sub-shell started by os.system("run something"). It happens in:

File "/mnt/leecher/tribler/Tribler/dispersy/endpoint.py", line 187, in _loop
    read_list, write_list, _ = select(socket_list, [], [], 0.1)

Apparently, when Python interpreter starts a sub-process, and the process does not ends correctly, it could send a SIGCHLD to it's parent process.
If, at this moment, Python interpreter is waiting for data in the Unix "select" call, it would be interrupted by the signal. The signal should be processed by a signal handler, and if the handler is not there, we would get a error: (4, 'Interrupted system call').
Introduction to the problem on a Google mailing list.

@ichorid
Copy link
Contributor Author

ichorid commented Apr 4, 2018

It is interesting to note that Tribler code contains almost no os.system() calls, except in Electrum wallet section.

@devos50 devos50 added this to the V7.1: The token micro-economy milestone Apr 4, 2018
@qstokkink
Copy link
Contributor

qstokkink commented Apr 5, 2018

Oh wow. Do you have any clue how to fix this?
I.e. can we catch the error and ignore this, or will it keep failing?

@qstokkink qstokkink modified the milestones: V7.1: The token micro-economy, Backlog Apr 5, 2018
@qstokkink
Copy link
Contributor

As there is a workaround for @ichorid and this is probably not a major issue for normal users, I moved this to the backlog.

@ichorid ichorid self-assigned this May 7, 2018
@ichorid
Copy link
Contributor Author

ichorid commented May 30, 2018

As we had moved to use Twisted endpoint, we don't use select anymore. Closed.

@ichorid ichorid closed this as completed May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants