Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreConnectTimeoutError: Could not connect with the Tribler Core within 120 seconds: ConnectionRefusedError (code 1) #7137

Closed
sentry-for-tribler bot opened this issue Nov 2, 2022 · 10 comments · Fixed by #7251 or #7915

Comments

@sentry-for-tribler
Copy link

sentry-for-tribler bot commented Nov 2, 2022

Sentry Issue: TRIBLER-J3

CreationTraceback: 
  File "run_tribler.py", line 100, in <module>
  File "tribler\gui\start_gui.py", line 77, in run_gui
  File "tribler\gui\utilities.py", line 411, in trackback_wrapper
  File "tribler\gui\event_request_manager.py", line 175, in reconnect
  File "tribler\gui\event_request_manager.py", line 188, in _connect

CoreConnectTimeoutError: Could not connect with the Tribler Core within 120 seconds: ConnectionRefusedError (code 1)
  File "tribler\gui\utilities.py", line 414, in trackback_wrapper
  File "tribler\gui\utilities.py", line 411, in trackback_wrapper
  File "tribler\gui\event_request_manager.py", line 188, in <lambda>
  File "tribler\gui\event_request_manager.py", line 125, in on_error

Last Core output:,

[PID:17180] 2022-11-01 10:36:58,200 - INFO - ProcessChecker(42) - Check[PID:17180] 2022-11-01 10:36:58,200 - INFO - ProcessChecker(85) - Get PID from the lock file[PID:17180] 2022-11-01 10:36:58,200 - WARNING - ProcessChecker(91) - [Errno 2] No such file or directory: 'C:\Users\\AppData\Roaming\.Tribler\triblerd.lock'[PID:17180] 2022-11-01 10:36:58,477 - INFO - ProcessChecker(98) - Check process cmd: c:\program files\tribler\tribler.exe--core[PID:17180] 2022-11-01 10:36:58,477 - INFO - Proc...,
[PID:17180] 2022-11-01 10:36:58,484 - INFO - tribler.core.check_os(109) - Check and enable code tracing. Process name: "core". Log dir: "C:\Users<user>\AppData\Roaming.Tribler\7.12\log"

Related: #7032

@xoriole xoriole self-assigned this Nov 2, 2022
@xoriole
Copy link
Contributor

xoriole commented Nov 2, 2022

Looking at the last core output, it seems the core is running. Looking at the error code ConnectionRefusedError (code 1) the GUI is not able to connect to the core, it is likely that

  1. Core is not running on the port the GUI is trying to connect
  2. Core is not able to handle incoming client requests and is rejecting them - this is less likely

For the first case, it can happen if the retry_port is enabled and the core starts on another port within the retry limit (which is 10 by default). Why the core is not able to bind to the allocated port is something to be investigated.

To confirm this suspicion, we could enrich the sentry report to include:

  1. Core port sent via environment variable when spawning the core
  2. Actual port the REST Manager starts on the core

Further, as a temporary ugly fix, an idea on the GUI side could be to try to connect the core in the next 10 ports if the core fails to connect within the first 2 mins. If it fails then, it could fail just like now.

@xoriole
Copy link
Contributor

xoriole commented Nov 2, 2022

Seems it is possible to get the connection ports of a process with given pid, the actual core port can be obtained. Then this port can be compared with the port set from the GUI to check for then difference and update the request manager accordingly. This should likely fix this issue.

@drew2a
Copy link
Contributor

drew2a commented Nov 9, 2022

Related to #7065

@synctext
Copy link
Member

synctext commented Jan 9, 2023

This 120 seconds error is just one error which we hope that is fixed now.
Two instances of Tribler core could run together or two GUIs! Fixed.
Other mysterious error this could cause: both trying to open the same database file, leading to strange error and Sentry report.

@drew2a
Copy link
Contributor

drew2a commented Jan 3, 2024

The issue is still present: Sentry Issue #1352, occurring in the most recent 7.13.1.

image

EVENTS 1k
USERS 15

@drew2a drew2a reopened this Jan 3, 2024
@kozlovsky
Copy link
Contributor

We can start using a "shift-right testing" approach by adding a new checkbox to the error report dialog, "next time, gather detailed information about the error". This way, the user can allow Tribler to gather and send an extended error report to Sentry with debugging information enabled. This is an opt-in approach when a user explicitly agrees to send detailed information to developers, and some users who experience the bug can be motivated enough to enable sending detailed information about the error.

@xoriole
Copy link
Contributor

xoriole commented Feb 15, 2024

Investigating this issue further, there are no new instances of this issue on 7.13.1 reported for Windows. However, there are reports on macOS and Linux (both Debian and Flatpak) version. My suspicion is on the filelock mechanism to checking existing process.

@xoriole
Copy link
Contributor

xoriole commented Feb 20, 2024

Investigating the logs further, I find that there are multiple core processes in the running state (no exit code or finished timestamp). Trying to reproduce the issue, I was able to on Flatpak environment.

There was a previous attempt at fixing the CoreConnectTimeoutError using FileLock. This works nicely for the normal scenario preventing double instances of Tribler running at the same time. However, for any reason, if the core process is terminated without a clean exit, it leaves the process database entry for the core in inconsistent state. This causes the new GUI instance to wrongly select the previously terminated core process and try to connect to Core Port of that process. Eventually, GUI fails to connect after the timeout period.

Under the assumption that GUI process is run first and it spawns the Core, the process entries on the process database should be sequential, GUI rowid first then the Core rowid. When the GUI process tries to get the core process, if it is checked to ensure the core process rowid should be higher than the GUI rowid, the correct core process will be returned and the connection will be successful. This is proposed in PR #7915.

Copy link
Author

sentry-for-tribler bot commented May 2, 2024

[7.14.0] EVENTS: 5

Sentry issue: TRIBLER-1PM

@drew2a drew2a reopened this May 2, 2024
@xoriole xoriole removed their assignment Aug 19, 2024
@xoriole
Copy link
Contributor

xoriole commented Aug 19, 2024

The core connection mechanism has changed on the new GUI, therefore this issue is no longer relevant. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment