Version 0.2.0
We have implemented a lot of major functionalities in the past week. Here's a summary of what's new in release v0.2.0.
spark.DataFrame:
- to_koalas is monkey patched into Spark's DataFrame API when koalas package is imported
koalas.DataFrame:
- count
- corr
- dtypes
- groupby
- sort_values now supports ascending, na_position, and inplace parameters
- to_numpy
- to_pandas (with toPandas as an alias for compatibility with Spark)
- to_string
- Allow direct literal assignment to create a new column
- Various stats functions now work with boolean type
- In notebooks or REPL, automatically display the content of the DataFrame, similar to pandas
koalas.Series:
- alias (as an alias for rename function)
- count
- groupby
- to_numpy
- to_pandas (with toPandas as an alias for compatibility with Spark)
- to_string
- fillna
- Various stats functions now work with boolean type
- In notebooks or REPL, automatically display the content of the Series, similar to pandas
Significantly improved documentation of the project.
Last but not least, we have done some major refactoring of the codebase and its infrastructure to make it more amenable to changes in the future, e.g.
- Now koalas.DataFrame wraps around a Spark DataFrame, rather than directly monkey patching all methods.
- Doctests are enabled and can be run directly in PyCharm
- Mypy type hint linter is added
- Switched from nose to pytest for test infrastructure.
- Introduced utility methods to support older versions of pandas. #210
- Code coverage report