08 May 18:21

rlizzo

88922c7

v0.5.2 Release Latest

Latest

v0.5.2 (2020-05-08)

New Features

New column data type supporting arbitrary bytes data. (#198) @rlizzo

Improvements

str typed columns can now accept data containing any unicode code-point. In prior releases data containing any non-ascii character could not be written to this column type. (#198) @rlizzo

Bug Fixes

Fixed issue where str and (newly added) bytes column data could not be fetched / pushed between a local client repository and remote server. (#198) @rlizzo

Assets 2

06 Apr 16:07

rlizzo

v0.5.1

15761c8

Release v0.5.1

v0.5.1 (2020-04-05)

BugFixes

Fixed issue where importing make_torch_dataloader or make_tf_dataloader under python 3.6 Would raise a NameError regardless of if the package is installed. (#196) @rlizzo

Assets 2

04 Apr 11:54

rlizzo

v0.5.0

01c94bd

v0.5.0 Release

v0.5.0 (2020-04-4)

Improvements

Python 3.8 is now fully supported. (#193) @rlizzo
Major backend overhaul which defines column layouts and data types in the same interchangable / extensable manner as storage backends. This will allow rapid development of new layouts and data type support as new use cases are discovered by the community. (#184) @rlizzo
Column and backend classes are now fully serializable (pickleable) for read-only checkouts. (#180) @rlizzo
Modularized internal structure of API classes to easily allow new columnn layouts / data types to be added in the future. (#180) @rlizzo
Improved type / value checking of manual specification for column backend and backend_options. (#180) @rlizzo
Standardized column data access API to follow python standard library dict methods API. (#180) @rlizzo
Memory usage of arrayset checkouts has been reduced by ~70% by using C-structs for allocating sample record locating info. (#179) @rlizzo
Read times from the HDF5_00 and HDF5_01 backend have been reduced by 33-38% (or more for arraysets with many samples) by eliminating redundant computation of chunked storage B-Tree. (#179) @rlizzo
Commit times and checkout times have been reduced by 11-18% by optimizing record parsing and memory allocation. (#179) @rlizzo

New Features

Added str type column with same behavior as ndarray column (supporting both single-level and nested layouts) added to replace functionality of removed metadata container. (#184) @rlizzo
New backend based on LMDB has been added (specifier of lmdb_30). (#184) @rlizzo
Added .diff() method to Repository class to enable diffing changes between any pair of commits / branches without needing to open the diff base in a checkout. (#183) @rlizzo
New CLI command hangar diff which reports a summary view of changes made between any pair of commits / branches. (#183) @rlizzo
Added .log() method to Checkout objects so graphical commit graph or machine readable commit details / DAG can be queried when operating on a particular commit. (#183) @rlizzo
"string" type columns now supported alongside "ndarray" column type. (#180) @rlizzo
New "column" API, which replaces "arrayset" name. (#180) @rlizzo
Arraysets can now contain "nested subsamples" under a common sample key. (#179) @rlizzo
New API to add and remove samples from and arrayset. (#179) @rlizzo
Added repo.size_nbytes and repo.size_human to report disk usage of a repository on disk. (#174) @rlizzo
Added method to traverse the entire repository history and cryptographically verify integrity. (#173) @rlizzo

Changes

Argument syntax of __getitem__() and get() methods of ReaderCheckout and WriterCheckout classes. The new format supports handeling arbitrary arguments specific to retrieval of data from any column type. (#183) @rlizzo

Removed

metadata container for str typed data has been completly removed. It is replaced by a highly extensible and much more user-friendly str typed column. (#184) @rlizzo
__setitem__() method in WriterCheckout objects. Writing data to columns via a checkout object is no longer supported. (#183) @rlizzo

Bug Fixes

Backend data stores no longer use file symlinks, improving compatibility with some types file systems. (#171) @rlizzo
All arrayset types ("flat" and "nested subsamples") and backend readers can now be pickled -- for parallel processing -- in a read-only checkout. (#179) @rlizzo

Breaking changes

New backend record serialization format is incompatible with repositories written in version 0.4 or earlier.
New arrayset API is incompatible with Hangar API in version 0.4 or earlier.

Assets 2

04 Apr 10:03

rlizzo

v0.5.0dev3

e1bb0e8

v0.5.0 Pre-Release 2 Pre-release

Pre-release

Pre-Release for v0.5.0. Full Changelog To Follow.

Assets 2

04 Apr 09:21

rlizzo

v0.5.0dev2

fae9052

v0.5.0 Pre-Release Pre-release

Pre-release

Pre-Release for v0.5.0. Full Changelog To Follow.

Assets 2

26 Nov 07:01

rlizzo

v0.4.0

be7d40e

Release v0.4.0

Release Notes

New Features

Added ability to delete branch names/pointers from a local repository via both API and CLI. #128 @rlizzo
Added local keyword arg to arrayset key/value iterators to return only locally available samples #131 @rlizzo
Ability to change the backend storage format and options applied to an arrayset after initialization. #133 @rlizzo
Added blosc compression to HDF5 backend by default on PyPi installations. #146 @rlizzo
Added Benchmarking Suite to Test for Performance Regressions in PRs. #155 @rlizzo
Added new backend optimized to increase speeds for fixed size arrayset access. #160 @rlizzo

Improvements

Removed msgpack and pyyaml dependencies. Cleaned up and improved remote client/server code. #130 @rlizzo
Multiprocess Torch DataLoaders allowed on Linux and MacOS. #144 @rlizzo
Added CLI options commit, checkout, arrayset create, & arrayset remove. #150 @rlizzo
Plugin system revamp. #134 @hhsecond
Documentation Improvements and Typo-Fixes. #156 @alessiamarcolini
Removed implicit removal of arrayset schema from checkout if every sample was removed from arrayset. This could potentially result in dangling accessors which may or may not self-destruct (as expected) in certain edge-cases. #159 @rlizzo
Added type codes to hash digests so that calculation function can be updated in the future without breaking repos written in previous Hangar versions. #165 @rlizzo

Bug Fixes

Programatic access to repository log contents now returns branch heads alongside other log info. #125 @rlizzo
Fixed minor bug in types of values allowed for Arrayset names vs Sample names. #151 @rlizzo
Fixed issue where using checkout object to access a sample in multiple arraysets would try to create a namedtuple instance with invalid field names. Now incompatible field names are automatically renamed with their positional index. #161 @rlizzo
Explicitly raise error if commit argument is set while checking out a repository with write=True. #166 @rlizzo

Breaking changes

New commit reference serialization format is incompatible with repositories written in version 0.3.0 or earlier.

Assets 2

19 Oct 01:51

rlizzo

v0.4.0b0

f1c5d05

v0.4.0b0 Beta Pre-Release Pre-release

Pre-release

Merge pull request #145 from rlizzo/version-0-4-0b0

Version 0.4.0b0

Assets 2

10 Sep 07:52

rlizzo

v0.3.0

d337bec

v0.3.0 Release

New Features

API addition allowing reading and writing arrayset data from a checkout object directly. (#115) @rlizzo
Data importer, exporters, and viewers via CLI for common file formats. Includes plugin system for easy extensibility in the future. (#103) (@rlizzo, @hhsecond)

Improvements

Added tutorial on working with remote data. (#113) @rlizzo
Added Tutorial on Tensorflow and PyTorch Dataloaders. (#117) @hhsecond
Large performance improvement to diff/merge algorithm (~30x previous). (#112) @rlizzo
New commit hash algorithm which is much more reproducible in the long term. (#120) @rlizzo
HDF5 backend updated to increase speed of reading/writing variable sized dataset compressed chunks (#120) @rlizzo

Bug Fixes

Fixed ML Dataloaders errors for a number of edge cases surrounding partial-remote data and non-common keys. (#110) (@hhsecond, @rlizzo)

Breaking changes

New commit hash algorithm is incompatible with repositories written in version 0.2.0 or earlier

Assets 2

09 Aug 20:14

rlizzo

v0.2.0

a47aaf0

v0.2.0 Release

See changelog for full details

New Features

Numpy memory-mapped array file backend added.
Remote server data backend added.
Selection heuristics to determine appropriate backend from arrayset schema.
Partial remote clones and fetch operations now fully supported.
CLI has been placed under test coverage, added interface usage to docs.
TensorFlow and PyTorch Machine Learning Dataloader Methods (Experimental Release).

Improvements

Record format versioning and standardization so to not break backwards compatibility in the future.
Backend addition and update developer protocols and documentation.
Read-only checkout arrayset sample get methods now are multithread and multiprocess safe.
Read-only checkout metadata sample get methods are thread safe if used within a context manager.
Samples can be assigned integer names in addition to string names.
Forgetting to close a write-enabled checkout before terminating the python process will close the
checkout automatically for many situations.
Repository software version compatability methods added to ensure upgrade paths in the future.
Many tests added (including support for Mac OSX on Travis-CI).
lead

Bug Fixes

Diff results for fast forward merges now returns sensible results.
Many type annotations added, and developer documentation improved.

Breaking changes

Renamed all references to datasets in the API / world-view to arraysets.
These are backwards incompatible changes. For all versions > 0.2, repository upgrade utilities will
be provided if breaking changes occur.

Assets 2

24 May 18:21

rlizzo

v0.1.1

019fffc

v0.1.1 Release

Fix for readme which had typos and was push to PyPi

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.5.2 (2020-05-08)

New Features

Improvements

Bug Fixes

v0.5.1 (2020-04-05)

BugFixes

v0.5.0 (2020-04-4)

Improvements

New Features

Changes

Removed

Bug Fixes

Breaking changes

Release Notes

New Features

Improvements

Bug Fixes

Breaking changes

New Features

Improvements

Bug Fixes

Breaking changes

New Features

Improvements

Bug Fixes

Breaking changes

Releases: tensorwerk/hangar-py

v0.5.2 Release

v0.5.2 (2020-05-08)

New Features

Improvements

Bug Fixes

Release v0.5.1

v0.5.1 (2020-04-05)

BugFixes

v0.5.0 Release

v0.5.0 (2020-04-4)

Improvements

New Features

Changes

Removed

Bug Fixes

Breaking changes

v0.5.0 Pre-Release 2

v0.5.0 Pre-Release

Release v0.4.0

Release Notes

New Features

Improvements

Bug Fixes

Breaking changes

v0.4.0b0 Beta Pre-Release

v0.3.0 Release

New Features

Improvements

Bug Fixes

Breaking changes

v0.2.0 Release

New Features

Improvements

Bug Fixes

Breaking changes

v0.1.1 Release