Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreConnectTimeoutError on startup when too many active downloads #7032

Closed
kozlovsky opened this issue Sep 6, 2022 · 2 comments · Fixed by #7054
Closed

CoreConnectTimeoutError on startup when too many active downloads #7032

kozlovsky opened this issue Sep 6, 2022 · 2 comments · Fixed by #7054
Assignees

Comments

@kozlovsky
Copy link
Contributor

The latest Tribler may fail with ConnectionRefusedError when there are too many active downloads.
The actual error when connecting to the Core is ConnectionRefusedError which means that REST API is not yet started.
At the timeout moment, Core does not have any exceptions in its error log and appears to work normally.

The reason for that behavior is dependencies between components. RESTComponents wait for LibtorrentComponent to start, and LibtorrentComponent does not start until it loads all checkpoints. If a user has a significant number of active downloads (say, 1000), it is possible that the REST API is not able to start before timeout in EventRequestManager.

The previous version of Tribler (7.11, 7.12) also has this problem with the slow start of the REST API. But they also have another bug: a timeout in EventRequestManager does not work correctly. These two bugs somehow compensate each other, so the user can wait until Tribler starts successfully, albeit with a "Loading" page showing for a very long time.

To fix the problem, Tribler REST API should start without waiting until LibtorrentComponent loads all checkpoints, and checkpoints should be loaded in the background.

With this change, Tribler will start much faster with many downloads. It is also necessary to change the UI of the "Downloads" page to inform users that not all downloads are displayed in the UI yet.

@sentry-for-tribler
Copy link

Sentry issue: TRIBLER-J3

@kozlovsky
Copy link
Contributor Author

This Sentry issue helped to debug the last reason for CoreConnectTimeoutError in 7.12.1-RC4:

Currently, EventRequestManager starts tracking the request start time not when it sends a request to Core but at the moment when the EventRequestManager instance is created. When Tribler upgrades to a new version, it copies a potentially very big state directory, and the upgrade duration is also mistakenly included in timeout. When the upgrade is finished, and Core is eventually started, EventRequestManager immediately sends a request to Core REST API. On some systems, it immediately gets ConnectionRefusedError as the REST API was not able to start yet. Then the event request manager does not try to re-send the query, as it mistakenly believes that a timeout already happens.

To fix the bug, EventRequestManager should track the timeout duration not from the moment when the EventRequestManager instance is created but at the moment the actual request to the Core is issued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant