Release Notes

0.8.1

423 Add seaborn as a domain level extension for visualization
422 Add pandas_df.plot as the first namespace extension
421 Add the namespace concept to Fugue extensions
420 Add is_distributed to engines
419 Log transpiled SQL query upon error

0.8.0

384 Expanding Fugue API
410 Unify Fugue SQL dialect (syntax only)
409 Support arbitrary column names in Fugue
404 Ray/Dask engines guess optimal default partitions
403 Deprecate register_raw_df_type
392 Aggregations on Spark dataframes fail intermittently
398 Rework API Docs and Favicon
393 ExecutionEngine as_context
385 Remove DataFrame metadata
381 Change SparkExecutionEngine to use pandas udf by default
380 Refactor ExecutionEngine (Separate out MapEngine)
378 Refactor DataFrame show
377 Create bag
372 Infer execution engine from input
340 Migrate to plugin mode
369 Remove execution from FugueWorkflow context manager, remove engine from FugueWorkflow
373 Fixed Spark engine rename slowness when there are a lot of columns

0.7.3

362 Remove Python 3.6 Support
363 Create IbisDataFrame and IbisExecutionEngine
364 Enable Map type support
365 Support column names starting with numbers
361 Better error message for cross join

0.7.2

348 Make create data error more informative
349 Ray integration, phase 1: transformation and IO

0.7.1

345: Enabled file as input/output for transform and out_transform

0.7.0

326: Added tests for Python 3.6 - 3.10 for Linux and 3.7 - 3.9 for Windows. Updated devenv and CICD to Python 3.8.
321: Moved out Fugue SQL to https://github.com/fugue-project/fugue-sql-antlr, removed version cap of antlr4-python3-runtime
323: Removed version cap of DuckDB
334: Replaced RLock with SerializableRLock
337: Fixed index warning in fugue_dask
339: Migrated execution engine parsing to triad conditional_dispatcher
341: Added Dask Client to DaskExecutionEngine, and fixed bugs of Dask and Duckdb

0.6.6

Create a hybrid engine of DuckDB and Dask
Save Spark-like partitioned parquet files for all engines
Enable DaskExecutionEngine to transform dataframes with nested columns
A smarter way to determine default npartitions in Dask
Support even partitioning on Dask
Add handling of nested ArrayType on Spark
Change to plugin approach to avoid explicit import
Fixed Click version issue
Added version caps for antlr4-python3-runtime and duckdb as they both released new versions with breaking changes.

0.6.5

Make Fugue exceptions short and useful
Ibis integration (experimental)
Get rid of simple assignment (not used at all)
Improve DuckDB engine to use a real DuckDB ExecutionEngine
YIELD LOCAL DATAFRAME

0.6.4

Add an option to transform to turn off native dataframe output
Add callback parameter to transform and out_transform
Support DuckDB
Create fsql_ignore_case for convenience, make this an option in notebook setup
Make Fugue SQL error more informative about case issue
Enable pandas default SQL engine (QPD) to take lower case SQL

0.6.3

Change pickle to cloudpickle for Flask RPC Server
Add license to package

0.6.1

Parsed arbitrary object into execution engine
Made Fugue SQL accept +, ~, - in schema expression
Fixed transform bug for Fugue DataFrames
Fixed a very rare bug of annotation parsing

0.6.0

Added Select, Aggregate, Filter, Assign interfaces
Made compatible with Windows OS, added github actions to test on windows
Register built-in extensions
Accept platform dependent annotations for dataframes and execution engines
Let SparkExecutionEngine accept empty pandas dataframes
Move to codecov
Let Fugue SQL take input dataframes with name such as a.b

0.5.6

Dask repartitioning improvement
Separate Dask IO to use its own APIs
Improved Dask print function by adding back head
Made assert_or_throw lazy
Improved notebook setup handling for jupyter lab

0.5.5

HOTFIX avro support

0.5.4

Added built in avro support
Fixed dask print bug

0.5.3

Fixed multi take issue for dask
Fixed pandas, dask print slow

0.5.2

Added Codacy and Slack channel badges, fixed pylint
Created transform and out_transform functions
Added partition syntax sugar
Fixed FugueSQL CONNECT bug

0.5.1

Fugueless 1 2 3 4 5
Notebook experience and extension 1 2
NativeExecutionEngine: switched to use QPD for SQL
Spark pandas udf: migrate to applyInPandas and mapInPandas
SparkExecutionEngine take bug
Fugue SQL: PRINT ROWS n -> PRINT n ROWS|ROW
Refactor yield
Fixed Jinja templating issue
Change _parse_presort_exp from a private function to public
Failure to delete execution temp directory is annoying was changed to info

0.5.0

Limit and Limit by Partition
README code is working now
Limit was renamed to take and added to SQL interface
RPC for Callbacks to collect information from workers in real time
Changes in handling input dataframe determinism. This fixes a bug related to thread locks with Spark DataFrames because of a deepcopy.

0.4.9

sample function
Make csv infer schema consistent cross engine
Make loading file more consistent cross engine

0.4.8

Support **kwargs in interfaceless extensions, see this
Support Iterable[pd.DataFrame] as output type, see this
Alter column types
RENAME in Fugue SQL
CONNECT different SQL service in Fugue SQL
Fixed Spark EVEN REPARTITION issue

0.4.7

Add hook to print/show, see this.

0.4.6

Fixed import issue with OutputTransformer
Added fillna as a built-in transform, including SQL implementation

0.4.5

Extension validation interface and interfaceless syntax
Passing dataframes cross workflow (yield)
OUT TRANSFORM to transform and finish a branch of execution
Fixed a PandasDataFrame datetime issue that only happened in transformer interface approach

0.4.3

Unified checkpoints and persist
Drop columns and na implementations in both programming and sql interfaces
Presort takes array as input
Fixed jinja template rendering issue
Fixed path format detection bug

0.4.2

Require pandas 1.0 because of parquet schema
Improved Fugue SQL extension parsing logic
Doc for contributors to setup their environment

0.4.1

Added set operations to programming interface: union, subtract, intersect
Added distinct to programming interface
Ensured partitioning follows SQL convention: groups with null keys are NOT removed
Switched join, union, subtract, intersect, distinct to QPD implementations, so they follow SQL convention
Set operations in Fugue SQL can directly operate on Fugue statemens (e.g. TRANSFORM USING t1 UNION TRANSFORM USING t2)
Fixed bugs
Added onboarding document for contributors

<=0.4.0

Main features of Fugue core and Fugue SQL
Support backends: Pandas, Spark and Dask