Skip to content

Version 0.2.0

Compare
Choose a tag to compare
@rxin rxin released this 02 May 00:56
· 1431 commits to master since this release

We have implemented a lot of major functionalities in the past week. Here's a summary of what's new in release v0.2.0.

spark.DataFrame:

  • to_koalas is monkey patched into Spark's DataFrame API when koalas package is imported

koalas.DataFrame:

  • count
  • corr
  • dtypes
  • groupby
  • sort_values now supports ascending, na_position, and inplace parameters
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • Allow direct literal assignment to create a new column
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the DataFrame, similar to pandas

koalas.Series:

  • alias (as an alias for rename function)
  • count
  • groupby
  • to_numpy
  • to_pandas (with toPandas as an alias for compatibility with Spark)
  • to_string
  • fillna
  • Various stats functions now work with boolean type
  • In notebooks or REPL, automatically display the content of the Series, similar to pandas

Significantly improved documentation of the project.

Last but not least, we have done some major refactoring of the codebase and its infrastructure to make it more amenable to changes in the future, e.g.

  • Now koalas.DataFrame wraps around a Spark DataFrame, rather than directly monkey patching all methods.
  • Doctests are enabled and can be run directly in PyCharm
  • Mypy type hint linter is added
  • Switched from nose to pytest for test infrastructure.
  • Introduced utility methods to support older versions of pandas. #210
  • Code coverage report