Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrap_kvs inconsistencies (due to incorrect signature-based conditioning) #9

Open
thorwhalen opened this issue May 9, 2022 · 2 comments

Comments

@thorwhalen
Copy link
Member

thorwhalen commented May 9, 2022

And example of the problem with wrap_kvs(...obj_of_data).

from dol import Files, wrap_kvs

s = wrap_kvs(
     Files(rootdir), 
     obj_of_data=lambda x: bytes.decode(x)
)

s.head()  # works fine (as long as the first item's bytes can be decoded (as text)

But, though bytes.decode is "equivalent" to lambda x: bytes.decode(x), the following doesn't work:

from dol import Files, wrap_kvs

s = wrap_kvs(
     Files(rootdir), 
     obj_of_data=bytes.decode,
)

s.head() 

Instead we get:

TypeError: descriptor 'decode' for 'bytes' objects doesn't apply to a 'Store' object

The reason is that there's some dark (and apparently bad) magic happening behind the scenes.
Probably something that checks the signature of obj_of_data and does something (not so) "smart" about how it applies the function.

┆Issue is synchronized with this Asana task by Unito

@thorwhalen
Copy link
Member Author

Another example:

from dol import Files, wrap_kvs, Pipe, filt_iter
from py2store import DirReader

rootdir = '/Users/Thor.Whalen/tmp/wikis/'  # use any folder that contains git cloned packages

wrapper = wrap_kvs(obj_of_data=filt_iter(filt=lambda x: not x.split('/')[-2].startswith('.')))
MyStore = Pipe(DirReader, wrapper))
s = MyStore(rootdir)
list(s)  # works
# but this fails:
v = s[next(iter(s))]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [20], line 1
----> 1 v = s[next(iter(s))]
      2 v

File ~/Dropbox/dev/p3/proj/i/dol/dol/base.py:616, in Store.__getitem__(self, k)
    614 except self._errors_that_trigger_missing:
    615     data = self.__missing__(k)
--> 616 return self._obj_of_data(data)

File ~/Dropbox/dev/p3/proj/i/dol/dol/trans.py:1522, in _wrap_outcoming.<locals>.new_method(self, x)
   1515 @wraps(wrapped_func)
   1516 def new_method(self, x):
   1517     # # Long form (for explanation)
   (...)
   1520     # transformed_output_of_super_method = trans_func(self, output_of_super_method)
   1521     # return transformed_output_of_super_method
-> 1522     return trans_func(
   1523         self, getattr(super(store_cls, self), wrapped_method)(x)
   1524     )

TypeError: _func_wrapping_store_in_cls_if_not_type() takes 1 positional argument but 2 were given

Replace the wrapper with:

wrapper = wrap_kvs(obj_of_data=lambda v: filt_iter(v, filt=lambda x: not x.split('/')[-2].startswith('.')))

and it will work though.

So obviously something to do with the signature of obj_of_data -- or rather how the signature is used to decide what to do in wrap_kvs

@thorwhalen thorwhalen changed the title wrap_kvs(...obj_of_data) inconsistency. wrap_kvs inconsistencies (due to incorrect signature-based conditioning) Feb 28, 2023
@thorwhalen
Copy link
Member Author

Another problem with kv_wraps.
This one is not due to an incorrect signature based conditioning, but a similar source of evil: The "figure out how to use the trans_func based on some condition on this trans_func.

First, let's reproduce the problem.

The following works:

import dol, recode

# The folder I used, here: https://www.dropbox.com/sh/q61xgnce0br7udd/AABOYefWcHjnaygTfZFvs2Ofa?dl=0
# if you have graze, you can get it with:
# import graze
# url = 'https://www.dropbox.com/sh/q61xgnce0br7udd/AABOYefWcHjnaygTfZFvs2Ofa?dl=0'
# graze.graze(url);
# path = graze.url_to_filepath(url)

path = '/Users/thorwhalen/graze/https/www.dropbox.com_f/sh_f/q61xgnce0br7udd_f/AABOYefWcHjnaygTfZFvs2Ofa?dl=0'

s = dol.FilesOfZip(path)

codec = recode.mk_codec('h')

trans = dol.wrap_kvs(
    obj_of_data=lambda x: codec.decode(x), 
    data_of_obj=lambda x: codec.encode(x)
)

# trans = dol.kv_wrap(w)

ZippedPcms = trans(dol.FilesOfZip)
s = ZippedPcms(path)

k, v = s.head()
assert isinstance(next(iter(s.values())), list)

But if you replace the trans def with:

trans = dol.wrap_kvs(
    obj_of_data=codec.decode,
    data_of_obj=codec.encode
)

you get a

AttributeError: type object 'type' has no attribute 'chk_format'

The "how to use trans_func conditioning happens in _wrap_outcoming and _wrap_ingoing helper functions. For example:

def _wrap_outcoming(
    store_cls: type, wrapped_method: str, trans_func: Optional[callable] = None
):
    if trans_func is not None:
        wrapped_func = getattr(store_cls, wrapped_method)
        if not _has_unbound_self(trans_func):
            @wraps(wrapped_func)
            def new_method(self, x):
                return trans_func(getattr(super(store_cls, self), wrapped_method)(x))

        else:
            @wraps(wrapped_func)
            def new_method(self, x):
                return trans_func(
                    self, getattr(super(store_cls, self), wrapped_method)(x)
                )

        setattr(store_cls, wrapped_method, new_method)

The idea is, if the trans_func is meant to be given the instance of the story, the call happens differently.
Perhaps this doesn't have to be the case -- or there's a way to call the function in the same way for both cases.
I don't see any at this point, so assuming there's not, we have to figure out a better way to separate both cases.

One way is to engineer our condition better.

Another, that I would favor at this point, is to ask the user to mark the least frequent case (I believe it's the "include the store instance" case) explicitly. For example, by wrapping it in a class, such as IncludeStoreInCall(trans_func). Then the condition becomes simply:

if not isinstance(trans_func, IncludeStoreInCall):
    ...
else:
    trans_func = IncludeStoreInCall.trans_func
    ...

The only reason I'm not implementing this now is that this would be a breaking change, so we'd need to figure out how to handle this change, search for usages in other packages, write some tests, and refactor where needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant