-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for API Extensions #1465
Comments
For pyspark dataframes, you can register custom accessors by doing the following. If this gets add it core, I believe it will work here too. class CachedAccessor:
"""
Custom property-like object (descriptor) for caching accessors.
Parameters
----------
name : str
The namespace this will be accessed under, e.g. ``df.foo``
accessor : cls
The class with the extension methods.
NOTE
----
Modified based on pandas.core.accessor.
"""
def __init__(self, name, accessor):
self._name = name
self._accessor = accessor
def __get__(self, obj, cls):
if obj is None:
# we're accessing the attribute of the class, i.e., Dataset.geo
return self._accessor
accessor_obj = self._accessor(obj)
# Replace the property with the accessor object. Inspired by:
# http://www.pydanny.com/cached-property.html
setattr(obj, self._name, accessor_obj)
return accessor_obj
def _register_accessor(name, cls):
"""
NOTE
----
Modified based on pandas.core.accessor.
"""
def decorator(accessor):
if hasattr(cls, name):
warnings.warn(
"registration of accessor {!r} under name {!r} for type "
"{!r} is overriding a preexisting attribute with the same "
"name.".format(accessor, name, cls),
UserWarning,
stacklevel=2,
)
setattr(cls, name, CachedAccessor(name, accessor))
return accessor
return decorator
def register_dataframe_accessor(name):
"""
NOTE
----
Modified based on pandas.core.accessor.
"""
try:
from pyspark.sql import DataFrame
except ImportError:
import_message(
submodule="spark",
package="pyspark",
conda_channel="conda-forge",
pip_install=True,
)
return _register_accessor(name, DataFrame)
def register_dataframe_method(method):
"""Register a function as a method attached to the Pyspark DataFrame.
NOTE
----
Modified based on pandas_flavor.register.
"""
def inner(*args, **kwargs):
class AccessorMethod:
def __init__(self, pyspark_obj):
self._obj = pyspark_obj
@wraps(method)
def __call__(self, *args, **kwargs):
return method(self._obj, *args, **kwargs)
register_dataframe_accessor(method.__name__)(AccessorMethod)
return method
return inner() Then call the code: @register_dataframe_accessor('amazingtimes')
class AmazingNameDataFrameAccessor():
def __init__(self, data):
self._data = data
print('foo')
@property
def hello(self):
return 'pyspark accessor'
def method(self, a=1):
"""this is a method example"""
a += a
return a
@property
def columns(self):
return self._data.schema.names Usage print(df.amazingtimes.hello) |
Seems pretty good. @achapkowski are you interested in opening a PR? Let's make sure the documentation and usage is similar or same as pandas'. |
@achapkowski let me know if you're going to take this on - if not, I can take a look at it next week. |
Please go ahead @scook12! |
Thanks @HyukjinKwon! |
Issue
pandas exposes a pretty simple API to let library developers extend pandas objects via registering accessors. It would be awesome if koalas would support a similar feature.
Resources
Docs: https://pandas.pydata.org/pandas-docs/stable/reference/extensions.html
Public API: https://github.com/pandas-dev/pandas/blob/master/pandas/api/extensions/__init__.py
Accessors: https://github.com/pandas-dev/pandas/blob/master/pandas/core/accessor.py
#420 has a possibly related discussion on pandas extension dtypes.
The text was updated successfully, but these errors were encountered: