-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate whether we can speed up operations on tabular data by using another backend #196
Comments
Closes partially #196. ### Summary of Changes * Add `polars` * Create `ColumnType` for `polars` data type * Create `Schema` for `polars` data frame --------- Co-authored-by: megalinter-bot <[email protected]>
Closes partially #196. Closes #149. ### Summary of Changes * `Row` now uses a `polars.DataFrame` instead of a `pandas.Series` to store its data. The `DataFrame` can directly store the column names. * Remove the `__hash__` method. A `Row` can no longer be used in a `set` and as the key of a `dict`. If we find a use-case for this, we'll add it back. --------- Co-authored-by: megalinter-bot <[email protected]>
## [0.11.0](v0.10.0...v0.11.0) (2023-04-21) ### Features * `OneHotEncoder.inverse_transform` now maintains the column order from the original table ([#195](#195)) ([3ec0041](3ec0041)), closes [#109](#109) [#109](#109) * add `plot_` prefix back to plotting methods ([#212](#212)) ([e50c3b0](e50c3b0)), closes [#211](#211) * adjust `Column`, `Schema` and `Table` to changes in `Row` ([#216](#216)) ([ca3eebb](ca3eebb)) * back `Row` by a `polars.DataFrame` ([#214](#214)) ([62ca34d](62ca34d)), closes [#196](#196) [#149](#149) * clean up `Row` class ([#215](#215)) ([b12fc68](b12fc68)) * convert between `Row` and `dict` ([#206](#206)) ([e98b653](e98b653)), closes [#204](#204) * convert between a `dict` and a `Table` ([#198](#198)) ([2a5089e](2a5089e)), closes [#197](#197) * create column types for `polars` data types ([#208](#208)) ([e18b362](e18b362)), closes [#196](#196) * dataframe interchange protocol ([#200](#200)) ([bea976a](bea976a)), closes [#199](#199) * move existing ML solutions into `safeds.ml.classical` package ([#213](#213)) ([655f07f](655f07f)), closes [#210](#210) ### Bug Fixes * `table.keep_only_columns` now maps column names to correct data ([#194](#194)) ([459ab75](459ab75)), closes [#115](#115) * typo in type hint ([#184](#184)) ([e79727d](e79727d)), closes [#180](#180)
Overall, import polars as pl
series = pl.Series("col", [1, "a", True, None])
for value in series:
print(value)
# None
# a
# None
# None This document mentions that an Likewise, other libraries also need to support |
### Summary of Changes In #214 we changes the implementation of `Row` so its data was stored in a `polars.DataFrame`. As explained [here](#196 (comment)), `pandas` works better for us for now. We might undo this change in the future if the type inference of `polars` gets improved (or we decide to implement this ourselves). --------- Co-authored-by: megalinter-bot <[email protected]>
pandas
v2.0.0 introducespyarrow
as a new backend, which is supposedly faster thannumpy
pyarrow
directlypandas
bypolars
Generally, the interface of
polars
seems to be nicer to work with for us thanpandas
. They also do not have anindex
, which is closer to our design, too.Tasks
Table
bypolars.DataFrame
Row
bypolars.DataFrame
Column
bypolars.Series
The text was updated successfully, but these errors were encountered: