Support for API Extensions #1465

scook12 · 2020-05-06T18:21:09Z

Issue

pandas exposes a pretty simple API to let library developers extend pandas objects via registering accessors. It would be awesome if koalas would support a similar feature.

Resources

Docs: https://pandas.pydata.org/pandas-docs/stable/reference/extensions.html

Public API: https://github.com/pandas-dev/pandas/blob/master/pandas/api/extensions/__init__.py

Accessors: https://github.com/pandas-dev/pandas/blob/master/pandas/core/accessor.py

#420 has a possibly related discussion on pandas extension dtypes.

achapkowski · 2020-05-28T13:33:05Z

For pyspark dataframes, you can register custom accessors by doing the following. If this gets add it core, I believe it will work here too.

class CachedAccessor:
    """
    Custom property-like object (descriptor) for caching accessors.

    Parameters
    ----------
    name : str
        The namespace this will be accessed under, e.g. ``df.foo``
    accessor : cls
        The class with the extension methods.

    NOTE
    ----
    Modified based on pandas.core.accessor.
    """

    def __init__(self, name, accessor):
        self._name = name
        self._accessor = accessor

    def __get__(self, obj, cls):
        if obj is None:
            # we're accessing the attribute of the class, i.e., Dataset.geo
            return self._accessor
        accessor_obj = self._accessor(obj)
        # Replace the property with the accessor object. Inspired by:
        # http://www.pydanny.com/cached-property.html
        setattr(obj, self._name, accessor_obj)
        return accessor_obj


def _register_accessor(name, cls):
    """
    NOTE
    ----
    Modified based on pandas.core.accessor.
    """

    def decorator(accessor):
        if hasattr(cls, name):
            warnings.warn(
                "registration of accessor {!r} under name {!r} for type "
                "{!r} is overriding a preexisting attribute with the same "
                "name.".format(accessor, name, cls),
                UserWarning,
                stacklevel=2,
            )
        setattr(cls, name, CachedAccessor(name, accessor))
        return accessor

    return decorator


def register_dataframe_accessor(name):
    """
    NOTE
    ----
    Modified based on pandas.core.accessor.
    """
    try:
        from pyspark.sql import DataFrame
    except ImportError:
        import_message(
            submodule="spark",
            package="pyspark",
            conda_channel="conda-forge",
            pip_install=True,
        )

    return _register_accessor(name, DataFrame)


def register_dataframe_method(method):
    """Register a function as a method attached to the Pyspark DataFrame.

    NOTE
    ----
    Modified based on pandas_flavor.register.
    """

    def inner(*args, **kwargs):
        class AccessorMethod:
            def __init__(self, pyspark_obj):
                self._obj = pyspark_obj

            @wraps(method)
            def __call__(self, *args, **kwargs):
                return method(self._obj, *args, **kwargs)

        register_dataframe_accessor(method.__name__)(AccessorMethod)

        return method

    return inner()

Then call the code:

 @register_dataframe_accessor('amazingtimes')
    class AmazingNameDataFrameAccessor():
        def __init__(self, data):
            self._data = data
            print('foo')
        @property
        def hello(self):
            return 'pyspark accessor'
        def method(self, a=1):
            """this is a method example"""
            a += a
            return a 
        @property
        def columns(self):
            return self._data.schema.names

Usage

print(df.amazingtimes.hello)

HyukjinKwon · 2020-05-28T14:03:57Z

Seems pretty good. @achapkowski are you interested in opening a PR? Let's make sure the documentation and usage is similar or same as pandas'.

scook12 · 2020-06-11T17:06:14Z

@achapkowski let me know if you're going to take this on - if not, I can take a look at it next week.

HyukjinKwon · 2020-06-12T01:21:28Z

Please go ahead @scook12!

scook12 · 2020-06-20T22:28:13Z

Thanks @HyukjinKwon!

This should close #1465 and #1285.

HyukjinKwon added discussions enhancement New feature or request labels May 7, 2020

scook12 mentioned this issue Jun 27, 2020

Add support for API extensions #1617

Merged

HyukjinKwon closed this as completed in #1617 Jul 1, 2020

HyukjinKwon pushed a commit that referenced this issue Jul 1, 2020

Add support for API extensions (#1617)

93ce3e8

This should close #1465 and #1285.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for API Extensions #1465

Support for API Extensions #1465

scook12 commented May 6, 2020

achapkowski commented May 28, 2020

HyukjinKwon commented May 28, 2020

scook12 commented Jun 11, 2020

HyukjinKwon commented Jun 12, 2020

scook12 commented Jun 20, 2020

Support for API Extensions #1465

Support for API Extensions #1465

Comments

scook12 commented May 6, 2020

Issue

Resources

achapkowski commented May 28, 2020

HyukjinKwon commented May 28, 2020

scook12 commented Jun 11, 2020

HyukjinKwon commented Jun 12, 2020

scook12 commented Jun 20, 2020