Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overall revamp of connection pool, retries and timeouts #3008

Open
achimnol opened this issue Oct 16, 2023 · 6 comments
Open

Overall revamp of connection pool, retries and timeouts #3008

achimnol opened this issue Oct 16, 2023 · 6 comments

Comments

@achimnol
Copy link

achimnol commented Oct 16, 2023

I'm experiencing many issues (already reported here by other people) related with the (blocking) connection pool in the asyncio version.

I had to rollback and pin my dependency version to 4.5.5 currently. (lablup/backend.ai#1620)

Since there are already many reports, here I would instead suggest some high-level design suggestions:

@achimnol achimnol changed the title Overall revamp of connection pool and timeouts Overall revamp of connection pool, retries and timeouts Oct 16, 2023
@achimnol
Copy link
Author

I'm not sure how to fix these problems without breaking the existing users. It seems that many of the problems are intertwined and having a better design and abstraction of connection and connection pools would be the way to go.

I understand that the list may look intimidating for the maintainers—I'm not urging to fix them ASAP, but hoping that this list would be a guide to embrace the issues in a more holistic view.

@martinitus
Copy link

martinitus commented Dec 20, 2023

For a new user it is really quite hard to understand whats going on with the connection ConnectionPool, Redis, and Connection objects. - E.g. I am trying to figure out if the retry functionality works with Azure AD authentication in the case that my password (token) expires while I am holding a connection from the pool.

Another couple of suggestions.

  • use with and async with to make sure, connections that are taken from the pool will be returned.
    E.g.
async with pool.take() as redis:
   await redis.set("foo","bar")
  • directly hand the user of the pool a working client (Redis) object
  • discard and recreate connections instead of trying to fix (reconnect them). Follow RAII (resource acquisition is initialization).
    • A connection will be connected when created. and only then added to the pool
    • a failure indicates the connection is broken. it will be cleaned up and removed from the pool
    • the pool will (already) automatically create new ones if required

Just some thoughts...

@achimnol
Copy link
Author

achimnol commented Apr 9, 2024

Looking forward to what happens in #3200! 👀

@SorianoMarmol
Copy link

For a new user it is really quite hard to understand whats going on with the connection ConnectionPool, Redis, and Connection objects. - E.g. I am trying to figure out if the retry functionality works with Azure AD authentication in the case that my password (token) expires while I am holding a connection from the pool.

Do you have any clues or guidance on this? I am analyzing the same authentication, though in my case it's not just Redis, but also Celery, django-redis, python redis lock...

@mzealey
Copy link

mzealey commented Nov 12, 2024

With sentinel it becomes a whole other question also, which I cannot seem to find covered in the docs or examples and even reading the source is a bit confusing. Fundamentally what I want is a reliable object (or two if doing readonly) that I can say 'give me an active connection', but getting to that state is pretty hard...

Currently I create a sentinel, which in itself is a bit horrific:

      sentinel = aioredis.sentinel.Sentinel(  # type: ignore
          settings.redis_sentinel_hosts,
          **_standard_connection_kwargs(settings, **kwargs),
          sentinel_kwargs={
              **_standard_connection_kwargs(settings, **kwargs),
              "password": settings.redis_sentinel_password or settings.redis_password,
          },
      )

I can then use slave_for / master_for but these only seem to be single connections rather than returning a pool that I can use elsewhere. In the test suite it has only one test for SentinelConnectionPool which seems to just test that it tries a couple of hosts and then blows up. So at present I just use master_for / slave_for, but from what I can see from tcpdump on a live server this creates a new TCP conn each time rather than polling it (whereas in this case the sentinel connection is persistent).

@mattjudge
Copy link

Is it possible to get an update on where / if the above issues sit on the roadmap? Is e.g. #3200 still intended to be developed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants