Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real example of unhashable arguments #91

Closed
cpatrickalves opened this issue Sep 5, 2022 · 2 comments
Closed

Real example of unhashable arguments #91

cpatrickalves opened this issue Sep 5, 2022 · 2 comments
Labels

Comments

@cpatrickalves
Copy link

Hi there,

First, thanks for this amazing project, it's really useful.

I did not understand how to use unhashable arguments.

Could you show me a real example?

How could I use it in a case like the one below?

@cachier()
def get_data(session, query):
    pass
@shaypal5
Copy link
Collaborator

shaypal5 commented Sep 7, 2022

Sure!

For example, let's say you're using cachier to cache the computation of your big_calc function, that operates on an int and a pandas.DataFrame as input parameters:

from random import random
import pandas as pd

from .util import _my_mongodb_getter

@cachier(mongetter=_my_mongodb_getter)
def big_calc(a: int, df: pd.DataFrame):
    """This has some big calculation inside."""
    return random()  # just placeholder code

This will either break or behave unexpectedly for some custom types. For example, two different dataframe objects with identical content will result in different keys (and thus will trigger separate calculations) because they are distinct objects (and thus yield different cache keys).

To overcome this, we can define custom cache-key generation behaviour for our function, that handles custom types (as in non-built-in types) correctly:

from random import random
import pandas as pd
import hashlib

def _my_custom_param_hasher(a: int, df: pd.DataFrame):
    df_hash = hashlib.sha256(
        pd.util.hash_pandas_object(obj).values.tobytes()
    ).hexdigest()
    # we build a tuple, and not a list, as a cache key
    # because tuples are immutable, and thus hashable
    return (a, df_hash)

@cachier(mongetter=_test_mongetter, hash_params=_my_custom_param_hasher)
def big_calc(a: int, df: pd.DataFrame):
    """This has some big calculation inside."""
    return random()  # just placeholder code

This is one possible example for a way to generate deterministic cache keys for such a set of parameters.

For a generalized way check out the following test, which implements a param-hasher function that handles arbitrary args and kwargs as inputs, and also handles pandas.DataFrames objects. You can use it as a template and add additional hashing capabilities per custom class you want to deal with:

https://github.com/shaypal5/cachier/blob/master/tests/test_mongo_core.py#L259

@cpatrickalves
Copy link
Author

I got it! =)
Thanks!! @shaypal5 I really appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants