You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The primary focus of this release has been to reduce the code in the library to reduce the complexity of the codebase. This has mainly resulted in functionality being removed, generally when it has been redundant rather than reducing utility.
Secondary improvements have focused on performance data handling.
Functionally, this release adds the ability to partition data by arbitrary columns.
Note this list does not include all changes.
Data Handling
D1 The Writers automatically create a zonemap.index containing data profile information, this is to be used to improve the performance of Reads as a BRIN to help eliminate Blobs from reads.
D2 Support for ORC structured files added to Readers and Writers.
API
A1⚠️BREAKING⚠️ - The schema validation is more strict, uses a different language to describe the types.
A2⚠️BREAKING⚠️ - The SqlReader has been removed and is being rewritten in Opteryx
A3⚠️BREAKING⚠️ - the matches operator has been renamed to similar to inline with other SQL engines
A4⚠️BREAKING⚠️ - The raw_path parameter and partitioning placeholders in dataset names on Readers and Writers have been deprecated and its functionality is replaced with a new parameter date_partitions which accepts a Tuple (or List). This Tuple is used to partition data in increasing resolution, e.g. ("year_{yyyy}", "month_{mm}", "day_{dd}"). raw_path functionality can be achieved by setting the partitioning to None or an empty Tuple. This change also introduces new date placeholder support for hours ({HH}), minutes ({MM}) and seconds ({SS}).
A5DictSet (dictionary-based) representation has been removed and replaced with a new class Relation (tuple-based) representation - this is still row-orientated - with this the STORAGE_CLASS is also removed. Relation aims to be comparable in functionality to DictSet, but not identical.
A6Project parameter on GCS Readers and Writers is now optional
A7 Removed ability to filter on reads
Internals
I1 The Writers determine the name of the blob going to be written when the WAL is created, not when committed to disk - this supports D1
I2 Some internal representation of numbers has been moved to the Decimal type.
I3 Indexing functionality has been removed - the benefits were minimal but added complexity to the code.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The primary focus of this release has been to reduce the code in the library to reduce the complexity of the codebase. This has mainly resulted in functionality being removed, generally when it has been redundant rather than reducing utility.
Secondary improvements have focused on performance data handling.
Functionally, this release adds the ability to partition data by arbitrary columns.
Note this list does not include all changes.
Data Handling
zonemap.index
containing data profile information, this is to be used to improve the performance of Reads as a BRIN to help eliminate Blobs from reads.API
matches
operator has been renamed tosimilar to
inline with other SQL enginesraw_path
parameter and partitioning placeholders in dataset names on Readers and Writers have been deprecated and its functionality is replaced with a new parameterdate_partitions
which accepts a Tuple (or List). This Tuple is used to partition data in increasing resolution, e.g.("year_{yyyy}", "month_{mm}", "day_{dd}")
.raw_path
functionality can be achieved by setting thepartitioning
toNone
or an empty Tuple. This change also introduces new date placeholder support for hours ({HH}
), minutes ({MM}
) and seconds ({SS}
).DictSet
(dictionary-based) representation has been removed and replaced with a new classRelation
(tuple-based) representation - this is still row-orientated - with this the STORAGE_CLASS is also removed. Relation aims to be comparable in functionality to DictSet, but not identical.Project
parameter on GCS Readers and Writers is now optionalInternals
Beta Was this translation helpful? Give feedback.
All reactions