Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core is stuck running after a crash, have to kill it manually #4673

Closed
vi opened this issue Jul 12, 2019 · 15 comments
Closed

Core is stuck running after a crash, have to kill it manually #4673

vi opened this issue Jul 12, 2019 · 15 comments

Comments

@vi
Copy link
Contributor

vi commented Jul 12, 2019

Tribler version/branch+revision:

v7.3.0-beta6 + merge #4670.

Operating system and version:

Linux

Steps to reproduce the behavior:
  1. Try starting Tribler be ./tribler.sh after previously running it sometime ago.
Expected behavior:

It starts

Actual behavior:

It crashes and suggests to send a report.

Next try it seems to start successfully.

Relevant log file output:
ERROR   1562956534.11                   Session:416  (Session)  Error in Upgrader callback chain: [Failure instance: Traceback: <class 'twisted.internet.error.CannotListenError'>: Couldn't listen on any:8085: [Errno 98] Address already in use.
Traceback (most recent call last):
  File "/home/tribler/tribler/TriblerGUI/event_request_manager.py", line 114, in on_read_data
    raise RuntimeError(json_dict["event"]["text"])
RuntimeError: Unhandled Error
Traceback (most recent call last):
  File "run_tribler.py", line 96, in <module>
    start_tribler_core(base_path, api_port)
  File "run_tribler.py", line 88, in start_tribler_core
    reactor.run()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1267, in run
    self.mainLoop()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1276, in mainLoop
    self.runUntilCurrent()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 902, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/home/tribler/tribler/Tribler/Core/Libtorrent/LibtorrentDownloadImpl.py", line 244, in schedule_create_engine
    checkpoint_disabled=checkpoint_disabled)
  File "/home/tribler/tribler/Tribler/Core/Libtorrent/LibtorrentDownloadImpl.py", line 342, in network_create_engine_wrapper
    return self.ltmgr.add_torrent(self, atp).addCallbacks(on_torrent_added, on_torrent_failed)
exceptions.AttributeError: 'NoneType' object has no attribute 'add_torrent'
@ichorid
Copy link
Contributor

ichorid commented Jul 12, 2019

Couldn't listen on any:8085: [Errno 98] Address already in use. means that another instance of Tribler Core is running in the background, possibly stuck.

Please, try to reproduce this problem. Before running ./tribler.sh for the second time, please, confirm that there is no other Tribler core process running in the background. Typically it looks like python ./run_tribler.py or something like that.

@vi
Copy link
Contributor Author

vi commented Jul 12, 2019

In any case Tribler should handle attemps to start another parallel instance better.

There should be explicit user-facing message that there may be another instance running, etc. It should mention busy TCP localhost port (because the port may be busy with a third-party app, not just Tribler).

@ichorid
Copy link
Contributor

ichorid commented Jul 13, 2019

There is a user-facing message for this kind of situation 😉
rise

That is why I asked you to double-check the existence of background Tribler process (and possibly provide some details on the situation that triggered this): so we know that the problem is in the already-running-checker routine.

Initially, I suspected that the process checker only checks for the GUI process existence, and do not account for the core itself. However, this seems not to be the case. So we need more information from you on what actions have led to this outcome. Could you please provide that?

@ichorid ichorid added this to the V7.3: Gigachannels milestone Jul 13, 2019
@ichorid ichorid modified the milestones: V7.3: Gigachannels, Backlog Jul 13, 2019
@vi
Copy link
Contributor Author

vi commented Jul 13, 2019

It don't remember core being running.

At least console where I started Tribler before was available for new Tribler start. Maybe there was also "Tribler shut down" message, maybe not. Some time ago I also used "force shutdown" button (I don't remember was it before the bug reproduced or earlier).

@vi
Copy link
Contributor Author

vi commented Jul 13, 2019

Tried explicitly occupying the port with socat tcp-l:8085,fork,reuseaddr,bind=127.0.0.1 open:/dev/null,

got "RuntimeError: Could not connect with the Tribler Core within 20 seconds" as error-report-sendable merrage

The message does not mention port 127.0.0.1:8085 and is not friendly.

GUI should specifically expect wisted.internet.error.CannotListenError from core by non-REST means (process exit code if core is a separate process, some pipe if not) and show customized error message.

Easier, but racy workaround: briefly bind then close port 8085 in Gui before starting Core. If bind failed then the bind in core would probably also fail.

@ichorid
Copy link
Contributor

ichorid commented Jul 13, 2019

It don't remember core being running.

At least console where I started Tribler before was available for new Tribler start.

If the GUI crashes, the console is still available, since the Core thread runs in the background. However, the logging output of the core is printed onto the same console (but you can still run stuff on it).

@qstokkink
Copy link
Contributor

This should be a simple case of setting SO_REUSEADDR on the REST API port in IPv8.

@qstokkink
Copy link
Contributor

qstokkink commented Sep 5, 2019

According to https://stackoverflow.com/q/11910140 SO_REUSEADDR is already set on the socket.

So there must be an actual process listening on the port already.

@qstokkink qstokkink removed their assignment Sep 5, 2019
@ichorid ichorid changed the title Tribler crashed on start: 'NoneType' object has no attribute 'add_torrent' Core is stuck running after a crash, have to kill it manually Jan 4, 2020
@devos50
Copy link
Contributor

devos50 commented Apr 1, 2020

Is this issue addressed now? I will move it to the 7.6 milestone but if it is resolved, feel free to close it 👍

@qstokkink
Copy link
Contributor

I believe this is fixed.

At any rate, the OP is out of date: anything in this thread that has not been resolved needs a new issue.

@tsilvs
Copy link

tsilvs commented Aug 5, 2022

Having this with 7.12 on Linux (Pop!_OS 20.04) even after deleting /home/$(echo $USER)/.Tribler/tribler.lock and killing tribler process. Don't know which service or process I should kill else exactly to workaround.

@kozlovsky
Copy link
Contributor

@JeffRockatansky thank you for reporting! It was fixed in #6941, and the fix will be available in 7.12.1 in a few days

@tsilvs
Copy link

tsilvs commented Aug 5, 2022

@kozlovsky

the fix will be available in 7.12.1 in a few days

Great news! Will definitely update. In the mean time, can you tell what are the steps to workaround this issue without reboot?

@kozlovsky
Copy link
Contributor

kozlovsky commented Aug 5, 2022

@JeffRockatansky what type of error do you experience if you run Tribler after deleting of the .Tribler/triblerd.lock file and killing all running tribler processes? Note that the file name is triblerd.lock, not tribler.lock.

@tsilvs
Copy link

tsilvs commented Aug 6, 2022

@kozlovsky
Accessing file management window (when adding .torrent file, move file storage of an existing download or setting download path with Browse button) provokes the crash.

The actual triblerd.lock file that my instance was using was on /home/$USER/.var/app/org.tribler.Tribler/.Tribler, because app is installed as Flatpak. Now the workaround works. Anyone who experiences this with v7.12 could try deleting it from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants