Implementaion of Container and mixed loaders (H4EP001)

With hickle 4.0.0 the code for dumping and loading dedicated objects like scalar values or numpy arrays was moved to dedicated loader modules. This first step of disentangling hickle core machinery from object specific included all objects and structures which were mappable to h5py.Dataset objects. This commit provides an implementaition of hickle extension proposal H4EP001 (#135). In this proposal the extension of the loader concept introduced by hickle 4.0.0 towards generic PyContainer based and mixed loaders specified. In addition to the proposed extension this proposed implementation inludes the following extensions hickle 4.0.0 and H4EP001 H4EP001: ======== PyContainer Interface includes a filter method which allows loaders when data is loaded to adjust, suppress, or insert addtional data subitems of h5py.Group objects. In order to acomplish the temorary modification of h5py.Group and h5py.Dataset object when file is opened in read only mode the H5NodeFilterProxy class is provided. This class will store all temporary modifications while the original h5py.Group and h5py.Dataset object stay unchanged hickle 4.0.0 / 4.0.1: ===================== Strings and arrays of bytes are stored as Python bytearrays and not as variable sized stirngs and bytes. The benefit is that hdf5 filters and hdf5.compression filters can be applied to Python bytearrays. The down is that data is stored as bytes of int8 datatype. This change affects native Python string scalars as well as numpy arrays containing strings. numpy.masked array is now stored as h5py.Group containin a dedicated dataset for data and mask each. scipy.sparce matrices now are stored as h5py.Group with containing the datasets data, indices, indptr and shape dictionary keys are now used as names for h5py.Dataset and h5py.Group objects. Only string, bytes, int, float, complex, bool and NonType keys are converted to name strings, for all other keys a key-value-pair group is created containg the key and value as its subitems. string and bytes keys which contain slashes are converted into key value pairs instead of converting slashes to backslashes. Distinction between hickle 4.0.0 string and byte keys with converted slashes is made by enclosing sting value within double quotes instead of single qoutes as donw by Python repr function or !r or %r string format specifiers. Consequently on load all string keys which are enclosed in single quotes will be subjected to slash conversion while any others will be used as ar. h5py.Group and h5py.Dataset objects the 'base_type' rerfers to 'pickle' are on load automatically get assigned object as their py_object_type. The related 'type' attribute is ignored. h5py.Dataset objects which do not expose a 'base_type' attribute are assumed to contain pickle string and thus get implicitly assigned 'pickle' base type. Thus on dump for all h5py.Dataset objects which contain pickle strings 'base_type' and 'type' attributes are ommited as their values are 'pickle' and object respective. Other stuff: ============ Full separation between hickle core and loaders Distinct unit tests for individual loaders and hickle core Cleanup of not any more required functions and classes Simplification of recursion on dump and load through self contained loader interface. is capbable to load hickle 4.0.x files which do not yet support PyContainer concept beyond list, tuple, dict and set includes extended test of loading hickel 4.0.x files contains fix for labda py_obj_type issue on numpy arrays with single non list/tuple object content. Python 3.8 refuses to unpickle lambda function string. Was observerd during finalizing pullrequest. Fixes are only activated when 4.0.x file is to be loaded Exceptoin thrown by load now includes exception triggering it including stacktrace for better localization of error in debuggin and error reporting.
telegraphic · Dec 2, 2020 · bf12c6a · bf12c6a
1 parent 1586d6d
commit bf12c6a
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 59 deletions.
diff --git a/hickle/loaders/load_pandas.py b/hickle/loaders/load_pandas.py
@@ -1,4 +1,5 @@
 import pandas as pd
 
 # TODO: populate with classes to load
-class_register = []
+class_register = []
+exclude_register = []
diff --git a/hickle/lookup.py b/hickle/lookup.py
@@ -1,31 +1,23 @@
 """
 #lookup.py
 
-<<<<<<< HEAD
 This file manages all the mappings between hickle/HDF5 metadata and python
 types.
 There are three dictionaries that are populated here:
 
 1) types_dict
 Mapping between python types and dataset and group creation functions, e.g.
-=======
-This file contains all the mappings between hickle/HDF5 metadata and python types.
-There are four dictionaries and one set that are populated here:
-
-1) types_dict
-types_dict: mapping between python types and dataset creation functions, e.g.
->>>>>>> Adding setup.py optional dependencies
     types_dict = {
-        list:        create_listlike_dataset,
-        int:         create_python_dtype_dataset,
-        np.ndarray:  create_np_array_dataset
+        list: (create_listlike_dataset, 'list'),
+        int: (create_python_dtype_dataset, 'int'),
+        np.ndarray: (create_np_array_dataset, 'ndarray'),
         }
 
 2) hkl_types_dict
-hkl_types_dict: mapping between hickle metadata and dataset loading functions, e.g.
+Mapping between hickle metadata and dataset loading functions, e.g.
     hkl_types_dict = {
-        "<type 'list'>"  : load_list_dataset,
-        "<type 'tuple'>" : load_tuple_dataset
+        'list': load_list_dataset,
+        'tuple': load_tuple_dataset
         }
 
 3) hkl_container_dict
@@ -36,33 +28,17 @@
         'dict': DictLikeContainer
     }
 
-5) types_not_to_sort
-type_not_to_sort is a list of hickle type attributes that may be hierarchical,
-but don't require sorting by integer index.
-
 ## Extending hickle to add support for other classes and types
 
 The process to add new load/dump capabilities is as follows:
 
 1) Create a file called load_[newstuff].py in loaders/
-2) In the load_[newstuff].py file, define your create_dataset and load_dataset functions,
-   along with all required mapping dictionaries.
-3) Add an import call here, and populate the lookup dictionaries with update() calls:
-    # Add loaders for [newstuff]
-    try:
-        from .loaders.load_[newstuff[ import types_dict as ns_types_dict
-        from .loaders.load_[newstuff[ import hkl_types_dict as ns_hkl_types_dict
-        types_dict.update(ns_types_dict)
-        hkl_types_dict.update(ns_hkl_types_dict)
-        ... (Add container_types_dict etc if required)
-    except ImportError:
-        raise
+2) In the load_[newstuff].py file, define your create_dataset and load_dataset
+   functions, along with the 'class_register' and 'exclude_register' lists.
+
 """
 
-import six
-import pkg_resources
 
-<<<<<<< HEAD
 # %% IMPORTS
 # Built-in imports
 import sys
@@ -79,13 +55,11 @@
 
 # hickle imports
 from .helpers import PyContainer,not_dumpable,nobody_is_my_name
-=======
->>>>>>> Adding setup.py optional dependencies
 
-def return_first(x):
-    """ Return first element of a list """
-    return x[0]
 
+# %% GLOBALS
+# Define dict of all acceptable types
+types_dict = {}
 
 # Define dict of all acceptable hickle types
 hkl_types_dict = {}
@@ -96,9 +70,6 @@ def return_first(x):
 # Empty list (hashable) of loaded loader names
 loaded_loaders = set()
 
-if six.PY2:
-    container_key_types_dict[b"<type 'unicode'>"] = unicode
-    container_key_types_dict[b"<type 'long'>"] = long
 
 # %% FUNCTION DEFINITIONS
 def load_nothing(h_node,base_type,py_obj_type): # pragma: nocover
@@ -139,6 +110,7 @@ def register_class(myclass_type, hkl_str, dump_function=None, load_function=None
     Parameters:
     -----------
         myclass_type type(class): type of class
+        hkl_str (str): String to write to HDF5 file to describe class
         dump_function (function def): function to write data to HDF5
         load_function (function def): function to load data from HDF5
         container_class (class def): proxy class to load data from HDF5
@@ -195,10 +167,12 @@ def register_class(myclass_type, hkl_str, dump_function=None, load_function=None
 
 
 def register_class_exclude(hkl_str_to_ignore):
-    """ Tell loading funciton to ignore any HDF5 dataset with attribute 'type=XYZ'
+    """ Tell loading funciton to ignore any HDF5 dataset with attribute
+    'type=XYZ'
 
     Args:
-        hkl_str_to_ignore (str): attribute type=string to ignore and exclude from loading.
+        hkl_str_to_ignore (str): attribute type=string to ignore and exclude
+            from loading.
     """
 
     if hkl_str_to_ignore in {b'dict_item',b'pickle'}:
@@ -235,6 +209,9 @@ def load_loader(py_obj_type, type_mro = type.mro):
     -------
         RuntimeError:
             in case py object is defined by hickle core machinery.
+
+    """
+
     # any function or method object, any class object will be passed to pickle
     # ensure that in any case create_pickled_dataset is called.
 

diff --git a/setup.py b/setup.py
@@ -1,31 +1,68 @@
 # To increment version
 # Check you have ~/.pypirc filled in
 # git tag x.y.z
-# git push --tags
-# python setup.py sdist upload
+# git push && git push --tags
+# rm -rf dist; python setup.py sdist bdist_wheel
+# TEST: twine upload --repository-url https://test.pypi.org/legacy/ dist/*
+# twine upload dist/*
+
+from codecs import open
+import re
+
 from setuptools import setup, find_packages
+import sys
+
+author = "Danny Price, Ellert van der Velden and contributors"
+
+with open("README.md", "r") as fh:
+    long_description = fh.read()
+
+with open("requirements.txt", 'r') as fh:
+    requirements = fh.read().splitlines()
+
+with open("requirements_test.txt", 'r') as fh:
+    test_requirements = fh.read().splitlines()
+
+# Read the __version__.py file
+with open('hickle/__version__.py', 'r') as f:
+    vf = f.read()
 
-version = '3.3.0'
-author  = 'Danny Price'
+# Obtain version from read-in __version__.py file
+version = re.search(r"^_*version_* = ['\"]([^'\"]*)['\"]", vf, re.M).group(1)
 
 setup(name='hickle',
       version=version,
-      description='Hickle - a HDF5 based version of pickle',
+      description='Hickle - an HDF5 based version of pickle',
+      long_description=long_description,
+      long_description_content_type='text/markdown',
       author=author,
       author_email='[email protected]',
       url='http://github.com/telegraphic/hickle',
-      download_url='https://github.com/telegraphic/hickle/archive/%s.tar.gz' % version,
+      download_url=('https://github.com/telegraphic/hickle/archive/v%s.zip'
+                    % (version)),
       platforms='Cross platform (Linux, Mac OSX, Windows)',
+      classifiers=[
+          'Development Status :: 5 - Production/Stable',
+          'Intended Audience :: Developers',
+          'Intended Audience :: Science/Research',
+          'License :: OSI Approved',
+          'Natural Language :: English',
+          'Operating System :: MacOS',
+          'Operating System :: Microsoft :: Windows',
+          'Operating System :: Unix',
+          'Programming Language :: Python',
+          'Programming Language :: Python :: 3',
+          'Programming Language :: Python :: 3.5',
+          'Programming Language :: Python :: 3.6',
+          'Programming Language :: Python :: 3.7',
+          'Programming Language :: Python :: 3.8',
+          'Topic :: Software Development :: Libraries :: Python Modules',
+          'Topic :: Utilities',
+          ],
       keywords=['pickle', 'hdf5', 'data storage', 'data export'],
-      #py_modules = ['hickle', 'hickle_legacy'],
-      install_requires=['numpy', 'h5py'],
-      extras_require={
-            'astropy': ['astropy'],
-            'scipy': ['scipy'],
-            'pandas': ['pandas'],
-            'color': ['django']
-      },
-      python_requires='>=2.7',
+      install_requires=requirements,
+      tests_require=test_requirements,
+      python_requires='>=3.5',
       packages=find_packages(),
       zip_safe=False,
 )