You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Visions' currently supports defining custom types, such as Path, File and URL. These types inherit from object and are stored as uniquely defined classes. This for instance means that URL is stored as the namedtuple ParseResult that is returned by urlparse.
This strategy is effective in application where the series was converted to the object type anyway and doesn't pose a problem to small to medium sized datasets. For larger datasets we should consider an additional strategy, where a new (d)type is created as alias for an existing pandas.dtype. Allowing for these kind of abstractions addresses one of the major shortcomings in pandas at the moment. Custom dtypes generally reduces the memory complexity and the computational complexity of membership checks from O(n) to O(1). The same functionality could be maintained through an accessor (series.path just like series.dt).
Two implementation considerations:
pandas' StringDtype and ExtensionDtype are experimental and may change. The code for this enhancement should therefore be a minimal layer over the pandas interface.
The StringDType was introduced in pandas v1.0.0. The ExtensionDType however, was introduced earlier. Visions should provide backwards compability.
A type-agnostic solution is proposed in the linked PR.
The text was updated successfully, but these errors were encountered:
@jamesmyatt Thanks for thinking along! cyberpandas is an exellent demonstration of how adding new types can be useful. On the other hand, it demonstrates how involves adding a type can get with pandas. The pandas devs are (currently) not keen on supporting subclassing of other ExtensionDtypes.
Visions' currently supports defining custom types, such as Path, File and URL. These types inherit from object and are stored as uniquely defined classes. This for instance means that URL is stored as the namedtuple ParseResult that is returned by
urlparse
.This strategy is effective in application where the series was converted to the object type anyway and doesn't pose a problem to small to medium sized datasets. For larger datasets we should consider an additional strategy, where a new (d)type is created as alias for an existing pandas.dtype. Allowing for these kind of abstractions addresses one of the major shortcomings in pandas at the moment. Custom dtypes generally reduces the memory complexity and the computational complexity of membership checks from O(n) to O(1). The same functionality could be maintained through an accessor (
series.path
just likeseries.dt
).Two implementation considerations:
A type-agnostic solution is proposed in the linked PR.
The text was updated successfully, but these errors were encountered: