Skip to content

v0.5.0 Release

Compare
Choose a tag to compare
@rlizzo rlizzo released this 04 Apr 11:54
· 40 commits to master since this release
01c94bd

v0.5.0 (2020-04-4)

Improvements

  • Python 3.8 is now fully supported. (#193) @rlizzo
  • Major backend overhaul which defines column layouts and data types in the same interchangable / extensable manner as storage backends. This will allow rapid development of new layouts and data type support as new use cases are discovered by the community. (#184) @rlizzo
  • Column and backend classes are now fully serializable (pickleable) for read-only checkouts. (#180) @rlizzo
  • Modularized internal structure of API classes to easily allow new columnn layouts / data types to be added in the future. (#180) @rlizzo
  • Improved type / value checking of manual specification for column backend and backend_options. (#180) @rlizzo
  • Standardized column data access API to follow python standard library dict methods API. (#180) @rlizzo
  • Memory usage of arrayset checkouts has been reduced by ~70% by using C-structs for allocating sample record locating info. (#179) @rlizzo
  • Read times from the HDF5_00 and HDF5_01 backend have been reduced by 33-38% (or more for arraysets with many samples) by eliminating redundant computation of chunked storage B-Tree. (#179) @rlizzo
  • Commit times and checkout times have been reduced by 11-18% by optimizing record parsing and memory allocation. (#179) @rlizzo

New Features

  • Added str type column with same behavior as ndarray column (supporting both single-level and nested layouts) added to replace functionality of removed metadata container. (#184) @rlizzo
  • New backend based on LMDB has been added (specifier of lmdb_30). (#184) @rlizzo
  • Added .diff() method to Repository class to enable diffing changes between any pair of commits / branches without needing to open the diff base in a checkout. (#183) @rlizzo
  • New CLI command hangar diff which reports a summary view of changes made between any pair of commits / branches. (#183) @rlizzo
  • Added .log() method to Checkout objects so graphical commit graph or machine readable commit details / DAG can be queried when operating on a particular commit. (#183) @rlizzo
  • "string" type columns now supported alongside "ndarray" column type. (#180) @rlizzo
  • New "column" API, which replaces "arrayset" name. (#180) @rlizzo
  • Arraysets can now contain "nested subsamples" under a common sample key. (#179) @rlizzo
  • New API to add and remove samples from and arrayset. (#179) @rlizzo
  • Added repo.size_nbytes and repo.size_human to report disk usage of a repository on disk. (#174) @rlizzo
  • Added method to traverse the entire repository history and cryptographically verify integrity. (#173) @rlizzo

Changes

  • Argument syntax of __getitem__() and get() methods of ReaderCheckout and WriterCheckout classes. The new format supports handeling arbitrary arguments specific to retrieval of data from any column type. (#183) @rlizzo

Removed

  • metadata container for str typed data has been completly removed. It is replaced by a highly extensible and much more user-friendly str typed column. (#184) @rlizzo
  • __setitem__() method in WriterCheckout objects. Writing data to columns via a checkout object is no longer supported. (#183) @rlizzo

Bug Fixes

  • Backend data stores no longer use file symlinks, improving compatibility with some types file systems. (#171) @rlizzo
  • All arrayset types ("flat" and "nested subsamples") and backend readers can now be pickled -- for parallel processing -- in a read-only checkout. (#179) @rlizzo

Breaking changes

  • New backend record serialization format is incompatible with repositories written in version 0.4 or earlier.
  • New arrayset API is incompatible with Hangar API in version 0.4 or earlier.