You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Major backend overhaul which defines column layouts and data types in the same interchangable / extensable manner as storage backends. This will allow rapid development of new layouts and data type support as new use cases are discovered by the community. (#184) @rlizzo
Column and backend classes are now fully serializable (pickleable) for read-only checkouts. (#180) @rlizzo
Modularized internal structure of API classes to easily allow new columnn layouts / data types to be added in the future. (#180) @rlizzo
Improved type / value checking of manual specification for column backend and backend_options. (#180) @rlizzo
Standardized column data access API to follow python standard library dict methods API. (#180) @rlizzo
Memory usage of arrayset checkouts has been reduced by ~70% by using C-structs for allocating sample record locating info. (#179) @rlizzo
Read times from the HDF5_00 and HDF5_01 backend have been reduced by 33-38% (or more for arraysets with many samples) by eliminating redundant computation of chunked storage B-Tree. (#179) @rlizzo
Commit times and checkout times have been reduced by 11-18% by optimizing record parsing and memory allocation. (#179) @rlizzo
New Features
Added str type column with same behavior as ndarray column (supporting both single-level and nested layouts) added to replace functionality of removed metadata container. (#184) @rlizzo
New backend based on LMDB has been added (specifier of lmdb_30). (#184) @rlizzo
Added .diff() method to Repository class to enable diffing changes between any pair of commits / branches without needing to open the diff base in a checkout. (#183) @rlizzo
New CLI command hangar diff which reports a summary view of changes made between any pair of commits / branches. (#183) @rlizzo
Added .log() method to Checkout objects so graphical commit graph or machine readable commit details / DAG can be queried when operating on a particular commit. (#183) @rlizzo
"string" type columns now supported alongside "ndarray" column type. (#180) @rlizzo
New "column" API, which replaces "arrayset" name. (#180) @rlizzo
Arraysets can now contain "nested subsamples" under a common sample key. (#179) @rlizzo
New API to add and remove samples from and arrayset. (#179) @rlizzo
Added repo.size_nbytes and repo.size_human to report disk usage of a repository on disk. (#174) @rlizzo
Added method to traverse the entire repository history and cryptographically verify integrity. (#173) @rlizzo
Changes
Argument syntax of __getitem__() and get() methods of ReaderCheckout and WriterCheckout classes. The new format supports handeling arbitrary arguments specific to retrieval of data from any column type. (#183) @rlizzo
Removed
metadata container for str typed data has been completly removed. It is replaced by a highly extensible and much more user-friendly str typed column. (#184) @rlizzo
__setitem__() method in WriterCheckout objects. Writing data to columns via a checkout object is no longer supported. (#183) @rlizzo
Bug Fixes
Backend data stores no longer use file symlinks, improving compatibility with some types file systems. (#171) @rlizzo
All arrayset types ("flat" and "nested subsamples") and backend readers can now be pickled -- for parallel processing -- in a read-only checkout. (#179) @rlizzo
Breaking changes
New backend record serialization format is incompatible with repositories written in version 0.4 or earlier.
New arrayset API is incompatible with Hangar API in version 0.4 or earlier.