Leverage pandas' ExtensionDtype for defining efficient new types #76

sbrugman · 2020-05-22T12:11:49Z

Visions' currently supports defining custom types, such as Path, File and URL. These types inherit from object and are stored as uniquely defined classes. This for instance means that URL is stored as the namedtuple ParseResult that is returned by urlparse.

This strategy is effective in application where the series was converted to the object type anyway and doesn't pose a problem to small to medium sized datasets. For larger datasets we should consider an additional strategy, where a new (d)type is created as alias for an existing pandas.dtype. Allowing for these kind of abstractions addresses one of the major shortcomings in pandas at the moment. Custom dtypes generally reduces the memory complexity and the computational complexity of membership checks from O(n) to O(1). The same functionality could be maintained through an accessor (series.path just like series.dt).

Two implementation considerations:

pandas' StringDtype and ExtensionDtype are experimental and may change. The code for this enhancement should therefore be a minimal layer over the pandas interface.
The StringDType was introduced in pandas v1.0.0. The ExtensionDType however, was introduced earlier. Visions should provide backwards compability.

A type-agnostic solution is proposed in the linked PR.

The text was updated successfully, but these errors were encountered:

sbrugman · 2020-05-22T13:14:08Z

Pending pandas-dev/pandas#34309 and pandas-dev/pandas#34310.

jamesmyatt · 2020-11-30T21:54:33Z

Might be worth look at cyberpandas (https://github.com/ContinuumIO/cyberpandas) which implements an IPAddress extension array.

sbrugman · 2020-11-30T22:04:38Z

@jamesmyatt Thanks for thinking along! cyberpandas is an exellent demonstration of how adding new types can be useful. On the other hand, it demonstrates how involves adding a type can get with pandas. The pandas devs are (currently) not keen on supporting subclassing of other ExtensionDtypes.

sbrugman added the enhancement New feature or request label May 22, 2020

sbrugman mentioned this issue May 22, 2020

[WIP] Pandas dynamically create StringDtype aliases #77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage pandas' ExtensionDtype for defining efficient new types #76

Leverage pandas' ExtensionDtype for defining efficient new types #76

sbrugman commented May 22, 2020 •

edited

Loading

sbrugman commented May 22, 2020 •

edited

Loading

jamesmyatt commented Nov 30, 2020

sbrugman commented Nov 30, 2020

Leverage pandas' ExtensionDtype for defining efficient new types #76

Leverage pandas' ExtensionDtype for defining efficient new types #76

Comments

sbrugman commented May 22, 2020 • edited Loading

sbrugman commented May 22, 2020 • edited Loading

jamesmyatt commented Nov 30, 2020

sbrugman commented Nov 30, 2020

sbrugman commented May 22, 2020 •

edited

Loading

sbrugman commented May 22, 2020 •

edited

Loading