Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache_disabled context manager #105

Closed

Conversation

parkerhancock
Copy link
Collaborator

I'm a recent httpx convert from requests, and I love the excellent requests-cache library. One of the features I miss the most from hishel (and use quite frequently with requests-cache) is the ability to use the .cache_disabled context manager, documented here.

After getting a bit more comfortable with this code base in my last PR, I wanted to submit a new one to add this feature into the library. This PR includes:

  • Updates to both CacheClient and AsyncCacheClient
  • Test coverage
  • Update to docs to document the feature
  • Edit to changelog

CHANGELOG.md Outdated Show resolved Hide resolved
@parkerhancock
Copy link
Collaborator Author

Added one more update here. Now the CacheClient and AsyncCacheClient have a .disabled property that indicates whether they are disabled, and no error is raised if you disable the cache in multiple places in nested code. This happens for me when I have an API using CacheClient that calls an authentication endpoint (which I want to use cache_disabled for), but then occasionally I want to disable the cache for the entire API, which currently throws an error when you try to double-unwrap the transports.

This new commit fixes that issue, and now you could call infinite .cache_disabled(), and it will function as expected without errors. E.g.:

import hishel
with hishel.CacheClient() as client:
    with client.cache_disabled():
        with client.cache_disabled():
            with client.cache_disabled():
                response = client.get("https://www.example.com") # Works!

CHANGELOG.md Outdated Show resolved Hide resolved
@karpetrosyan karpetrosyan added the enhancement New feature or request label Nov 10, 2023
@karpetrosyan
Copy link
Owner

I'm wondering if we really need this change.

Including this in the CacheClient class, for example, will ignore the support for the httpcore library, which is not what we want.

Also, as stated in https://hishel.com/userguide/#clients-and-transports, we do not recommend using the CacheClient and AsyncCacheClient classes, so adding a feature only for these classes makes no sense.

Instead, we can use the functionality that hishel already provides and that is mentioned in the specification.

no-store and max-age request directives

If the no-store request directive is present in the request headers, the cache will not save the response.

no-store: RFC Docs
max-age: RFC Docs

import hishel

cl = hishel.CacheClient()

cl.get("https://hishel.com", headers=[('Cache-Control', 'no-store')])
cl.get("https://hishel.com").extensions['from_cache']  # False

Also, we don't just want to avoid storing the incoming response in the cache. We also want to ignore the existing response that is in cache, so here's how we can do it:

cl.get("https://hishel.com", headers=[('Cache-Control', 'no-store', 'max-age=0')])

The no-store request directive prevents the response from being saved into the cache, and the max-age directive prevents the response from being taken from the cache.

@parkerhancock
Copy link
Collaborator Author

Hey! If this isn't the right fit for this library, no worries! I've implemented this feature in a subclass of hishel.AsyncCacheClient in the library I maintain here session.py. It's a subclass with some preferred defaults, and an extra method or two for convenience.

CacheClient and AsyncCacheClient are Probably The Most Popular, If Not Preferred, API's

That said, I think the reality of most people coming to Hishel is that they've discovered httpx, and they want a caching layer, kind of like requests-cache or cache-control. HTTP clients are prolific in python development, and most python devs don't have a fine-grained knowledge of RFC9111 (or even that it exists!). They just want to be kinder to API's they use and/or limit bandwidth use.

Both requests-cache and cache-control both have dead-simple API's that are similar to the Hishel CacheClient classes. So even if it isn't preferred, I think those still should be first-class API's for this library. For example:

# requests-cache
from requests_cache import CachedSession

session = CachedSession()

# cache-control
import requests
from cachecontrol import CacheControl

session = CacheControl(requests.Session())

So, I think CacheClient and AsyncCacheClient are probably this library's main API - even if you disagree.

This Method Makes Turning The Cache On And Off Dead Simple

In any case, I think this change actually makes a lot of sense - even with the transports. Think about it this way - you can use Hishel in two main ways:

# Using Hishel as an on-by-default cache client with exceptions:
from hishel import CacheClient

client = CacheClient
response = client.get("https://www.example.com") # Uses Cache
with client.cache_disabled():
    response = client.get("https://www.example.com") # Temporarily disables cache.

# Using Hishel as an off-by-default cache client with exceptions
from httpx import Client
from hishel import CacheTransport

cached_transport = CacheTransport(transport=httpx.HTTPTransport())
client = Client()
response = client.get("https://www.example.com") # No cache
client._transport = cached_transport
response = client.get("https://www.example.com") # Cached

People Probably Don't Know About Cache Directives

Self-explanatory in view of the above. I didn't know about the no-store and max-age directives. And I suspect most python devs don't either. But that does give an alternative solution. You could have the decorator temporarily overwrite the Cache-Control header in the session with no-store and max-age, and then replace it when done. But I wouldn't prefer that, just on the off chance that there's a bug in the caching logic. Instead, as this PR does, I'd rather .cache_disabled essentially just give me a bare httpx.Client free and clear of any caching logic. Which is what this does.

Conclusion

Again, thanks so much for maintaining this library! It's been great, and I'm happy to use it in my projects. No hard feelings if you reject this PR. But for the requests refugees that are bound to come upon this library, I think this little utility feature might be handy.

Thanks!

@karpetrosyan
Copy link
Owner

karpetrosyan commented Nov 11, 2023

Thank you very much for this pull request. I appreciate you being involved in this library 🙏

So even if it isn't preferred, I think those still should be first-class API's for this library

You mentioned a very important thing here.
I have added these classes just because newcomers should be able to use this library without any knowledge of how custom transport works.

I wish I could make them more usable.
HTTPX has a straightforward way to do that, which is to use custom transports.
They have a single abstract method that should be implemented, and we can assume that the interface of BaseHTTPTransport won't be modified.

But not everyone knows how to use custom transports, and it is much easier to use the CacheClient class in any case.

I even overrided some protected methods in CacheClient, which can potentially be changed at any time, so I find that the custom transports should always be preferred, though we can provide a very user-friendly experience through CacheClient as well.

So I am happy to merge this PR as a CacheClient enhancement, but first, we should figure out why we want thread and task UNsafe solutions like this if we already can do it the right way as specified in the RFC.

I find that it could be useful to provide a high-level interface for low-level things, like cache-control directives, so if we want to include something like this in our CacheClient class, I wish it could be just a simplified API for the RFC-level things.

@parkerhancock
Copy link
Collaborator Author

Hey! Maybe there's a better way to do this for hishel that matches the structure of httpx a bit better.

What if the API was something like this?

from hishel import CacheClient

client = CacheClient()
# Do this
client.get("https://www.example.com", extensions={"cache_disabled": True})
# As a convenience extension for this
client.get("https://www.example.com", headers=[('Cache-Control', 'no-store', 'max-age=0')])

Then, you could implement some logic in the .handle_request method of the hishel transports to use the extension to create the substitute headers for that request. And you wouldn't need to change the client classes at all - it would be baked into the transports at the core of the RFC implementation. And if we make sure that there's test coverage for it, that should alleviate any concerns that the RFC implementation isn't respecting the cache_disabled directive. That should also address the thread-safety concerns.

Thoughts?

@parkerhancock
Copy link
Collaborator Author

One more thought - I think the core goal of this change is to make something in the code that looks like "disable the cache, please!" so that the python code is understandable and maintainable, which the RFC-compliant headers are not. So another way to do it would be something like this:

# importable from hishel
NO_SAVE = {"Cache-Control": "no-store"}
CACHE_DISABLED = {"Cache-control": ["no-store", "max-age=0"]}

import hishel
client = hishel.CacheClient()
# Disable cache using named headers
client.get("https://www.example.com", headers=CACHE_DISABLED)
# Disable saving using named headers
client.get("https://www.example.com", headers=NO_SAVE)
# Adding additional headers (using PEP 584 syntax, but any dict merging technique could be used)
client.get("https:/www.example.com", headers=CACHE_DISABLED | {"header": "value"}))

Honestly, at this point, I think this is how I'm going to implement this feature in my own code. But it still might be nice to either have it in Hishel's code base, or suggested somewhere in the documentation (like a cookbook or something).

@karpetrosyan
Copy link
Owner

Hi! These solutions look really great, though I like the first one more.

We can have a section in the documentation that contains all about the usage of request and response extensions such as "from_cache", "cache_metadata" and "cache_disable".

Also, I just realized that we do not have any documentation about the existing extensions that Hishel supports, so I think it's a good opportunity to tidy up extensions and also introduce a new one.

@parkerhancock
Copy link
Collaborator Author

Awesome!

Question for you - can you write a test for the transports with a cacheable response that returns a result from the cache? I'm working on a new PR for this, but I'm running into difficulty writing a test for the transport that returns a mocked response from the cache. With that, I should be able to implement the logic in the transports to use the extensions feature mentioned above.

Thanks!

Parker

@karpetrosyan
Copy link
Owner

Question for you - can you write a test for the transports with a cacheable response that returns a result from the cache? I'm working on a new PR for this, but I'm running into difficulty writing a test for the transport that returns a mocked response from the cache. With that, I should be able to implement the logic in the transports to use the extensions feature mentioned above.

Yeah, of course.
Could you provide the minimum implementation so I can understand what kind of test you needed?
Then we can temporarily remove the implementation, ensure that tests fail and coverage fails in the right place, and then work on the implementation.

@parkerhancock
Copy link
Collaborator Author

Sure! Here is what I'm trying (and failing) at. I'm sure the problem is more obvious to you:

def test_transport_with_cache_disabled_extension(
    use_temp_dir,
):
    with hishel.MockTransport() as transport:
        transport.add_responses(
            [
                httpx.Response(200,),
                httpx.Response(200,),
            ]
        )
        with hishel.CacheTransport(transport=transport) as cache_transport:
            request = httpx.Request("GET", "https://www.example.com")
            # This should create a cache entry
            cache_transport.handle_request(request)
            # This should return from cache
            response = cache_transport.handle_request(request) 
            assert response.extensions["from_cache"]
            # This should ignore the cache
            response = cache_transport.handle_request(request, extensions={"cache_disabled": True}) 
            assert not response.extensions["from_cache"]

Test should fail. When I write the code to add the appropriate headers, it should pass.

One more thought for your consideration. Would you prefer a single "extension" for "cache_disabled" that can be set true or false (which is all I'm looking for), or maybe a "cache_mode" that takes a string or enumerable that could be "disabled," "no_store", or "enabled" (the default?).

Thanks!

@karpetrosyan
Copy link
Owner

Here is an example of how you can mock the caching responses.

from hishel._utils import BaseClock


def test_transport_with_cache_disabled_extension(use_temp_dir):
    class MockedClock(BaseClock):
        def now(self) -> int:
            return 1440504001  # Mon, 25 Aug 2015 12:00:01 GMT

    cachable_response = httpx.Response(
        200,
        headers=[
            (b"Cache-Control", b"max-age=3600"),
            (b"Date", b"Mon, 25 Aug 2015 12:00:00 GMT"),  # 1 second before the clock
        ],
    )

    with hishel.MockTransport() as transport:
        transport.add_responses(
            [
                cachable_response,
                httpx.Response(201)
            ]
        )
        with hishel.CacheTransport(
            transport=transport, controller=hishel.Controller(clock=MockedClock())
        ) as cache_transport:
            request = httpx.Request("GET", "https://www.example.com")
            # This should create a cache entry
            cache_transport.handle_request(request)
            # This should return from cache
            response = cache_transport.handle_request(request)
            assert response.extensions["from_cache"]
            # This should ignore the cache
            caching_disabled_request = httpx.Request(
                "GET", "https://www.example.com", headers={"Cache-Control": "max-age=0"}
            )
            response = cache_transport.handle_request(caching_disabled_request)
            assert not response.extensions["from_cache"]
            assert response.status_code == 201

Also, there is a test in the controllers test that uses such an approach.

def test_construct_response_from_cache_fresh():
class MockedClock(BaseClock):
def now(self) -> int:
return 1440504000
controller = Controller(clock=MockedClock())
response = Response(
status=200,
headers=[
(b"Cache-Control", b"max-age=3600"),
(b"Date", b"Mon, 25 Aug 2015 12:00:00 GMT"),
],
)
original_request = Request("GET", "https://example.com")
request = Request("GET", "https://example.com")
assert response is controller.construct_response_from_cache(
request=request, response=response, original_request=original_request
)

Would you prefer a single "extension" for "cache_disabled" that can be set true or false (which is all I'm looking for), or maybe a "cache_mode" that takes a string or enumerable that could be "disabled," "no_store", or "enabled" (the default?).

Very good question, though I think it should be up to you as the author of this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants