Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Connection pool FD leak v2 #2014

Merged
merged 5 commits into from
Jul 26, 2024

Conversation

tazarov
Copy link
Contributor

@tazarov tazarov commented Apr 14, 2024

Closes #1379

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Using weakrefs in pools' connections set.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js

Documentation Changes

N/A

Refs

Copy link
Contributor Author

tazarov commented Apr 14, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tazarov and the rest of your teammates on Graphite Graphite

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tazarov
Copy link
Contributor Author

tazarov commented Apr 14, 2024

Needs a test for to verify the file descriptor are being closed.

Tests might be flaky due to weakrefs relying on GC.

@tazarov tazarov force-pushed the trayan-04-14-feat_connection_pool_fd_leak_v2 branch from 3a969a8 to 2615912 Compare April 17, 2024 11:24
@tazarov tazarov mentioned this pull request Apr 17, 2024
1 task
Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a screenshot showing the FD count going down?

@tazarov
Copy link
Contributor Author

tazarov commented Apr 19, 2024

Can we add a screenshot showing the FD count going down?

@HammadB here's a short video to demo the PR:

https://www.loom.com/share/0d5fe43c9183439f9261381651b35ec5?sid=8fd8330f-6db9-4b5d-be23-637888314d5a

@pinsisong
Copy link

May I ask when it will close

@tazarov tazarov force-pushed the trayan-04-14-feat_connection_pool_fd_leak_v2 branch from 2615912 to 27b8594 Compare July 22, 2024 12:14
@tazarov tazarov requested a review from codetheweb July 24, 2024 06:55
Copy link
Contributor

@codetheweb codetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why this was happening? If I'm understanding correctly this makes it more likely for connections to be GCed but may not fix the underlying bug
it seems like this may result in a new connection being created for each transaction?
just want to understand more how this works since I ran into a similar issue a few months ago and never fully tracked it down

@tazarov
Copy link
Contributor Author

tazarov commented Jul 25, 2024

do you know why this was happening? If I'm understanding correctly this makes it more likely for connections to be GCed but may not fix the underlying bug it seems like this may result in a new connection being created for each transaction? just want to understand more how this works since I ran into a similar issue a few months ago and never fully tracked it down

The root of the issue is that thread locals which we use for tracking connections do not work (think: deterministically) well with async contexts, which FastAPI uses under the hood. This is what https://peps.python.org/pep-0567/ attempts to solve with contextvars. The PerThreadConnection pool leaks connections as the thread locals get recycled by asyncio (occasionally). However, we keep references to the connection in the connections field, which makes things leak.

I had a prior impl with contextvars, but that introduces challenges of its own - the contextvars should ideally be defined at the top of the call stack in FastAPI (this is what Depends() tries to solve) and instead of PerThread, we should have a proper connection pull with checkin/checkout mechanics for connections. This PR is not aimed at solving this issue at its core but to ensure system stability in the short term.

@codetheweb
Copy link
Contributor

I see, thank you for the great explanation. :)

@tazarov tazarov force-pushed the trayan-04-14-feat_connection_pool_fd_leak_v2 branch from a804043 to 0149135 Compare July 26, 2024 09:43
@tazarov tazarov force-pushed the trayan-04-14-feat_connection_pool_fd_leak_v2 branch from 0149135 to 1c0c615 Compare July 26, 2024 10:53
@tazarov tazarov merged commit 4b2a033 into main Jul 26, 2024
66 checks passed
@tazarov tazarov deleted the trayan-04-14-feat_connection_pool_fd_leak_v2 branch November 3, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: OSError: [Errno 24] Too many open files:
4 participants