Skip to content

Releases: databricks/koalas

Version 0.5.0

22 May 08:39
Compare
Choose a tag to compare

We refined the package management and pushed to conda-forge as well as PyPI. Now we can install Koalas with the conda package manager:

conda install koalas -c conda-forge

We also added the following features:

koalas:

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • Explicitly marked functions deprecated in pandas which we won't support without a special reason. (#342)
  • Introduced Index/MultiIndex corresponding to pandas', instead of reusing Series. (#341)

Version 0.4.0

15 May 07:17
Compare
Choose a tag to compare

We rapidly improved Koalas in documentation and added new functionalities in the past week. As of this release, all functions are documented. We also added the following features:

koalas:

  • range (#254) - for generating a distributed sequence of data
  • sql (#256) - for running SQL queries

koalas.DataFrame:

koalas.Series:

Along with the following improvements:

  • Design Principles and Contribution Guide (#246, #255)
  • DataFrame.drop now supports columns parameter (#253)
  • repr and repr_html improvements (#258) - only shows top 1000 when the number of values/rows in DataFrame and Series exceed 1000.

Version 0.3.0

07 May 04:27
Compare
Choose a tag to compare

We fixed a critical bug for Python 3.5 introduced in v0.2.0. #241

Also we have added the following features:

koalas.DataFrame:

  • isin
  • to_dict

koalas.Series:

  • isin
  • to_dict

and improvements:

koalas.Series:

  • __add__ and __radd__ now supports string concatenation

koalas.groupby.GroupBy:

  • agg() now preserves the group keys as indices

and a lot of code and document cleanups.

Version 0.2.0

02 May 00:56
Compare
Choose a tag to compare

We have implemented a lot of major functionalities in the past week. Here's a summary of what's new in release v0.2.0.

spark.DataFrame:

  • to_koalas is monkey patched into Spark's DataFrame API when koalas package is imported

koalas.DataFrame:

  • count
  • corr
  • dtypes
  • groupby
  • sort_values now supports ascending, na_position, and inplace parameters
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • Allow direct literal assignment to create a new column
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the DataFrame, similar to pandas

koalas.Series:

  • alias (as an alias for rename function)
  • count
  • groupby
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • fillna
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the Series, similar to pandas

Significantly improved documentation of the project.

Last but not least, we have done some major refactoring of the codebase and its infrastructure to make it more amenable to changes in the future, e.g.

  • Now koalas.DataFrame wraps around a Spark DataFrame, rather than directly monkey patching all methods.
  • Doctests are enabled and can be run directly in PyCharm
  • Mypy type hint linter is added
  • Switched from nose to pytest for test infrastructure.
  • Introduced utility methods to support older versions of pandas. #210
  • Code coverage report

Version 0.1.0

23 Apr 17:03
Compare
Choose a tag to compare

We rewrote the internals of Koalas to make it more extensible for upcoming features. We also laid down the foundation for API reference docs in this release.

Version 0.0.6

19 Apr 17:55
178bc0c
Compare
Choose a tag to compare
Version 0.0.6 Pre-release
Pre-release

This version significantly expands the amount of functions available. It is still meant to be a technology preview, and users are encouraged to report issues that they encounter with their current pandas code.

Noteworthy features:

  • indexing is now supported
  • slicing and accessing columns is much improved
  • most of the methods are accessible as stubs
  • support for N/A (fillna, dropna, etc.) has been added

We thank all the contributors who have contributed to this release.

Version 0.0.5

26 Mar 13:50
Compare
Choose a tag to compare
Version 0.0.5 Pre-release
Pre-release

This is the initial release outside Databricks.

This release is meant to be a technology preview. See the README.md file for more information.