Hickle 5 rc #149

hernot · 2021-01-18T23:49:04Z

Ok here we go as agreed discussing pr #138 here the fully assembled hickle-5-RC branch. In case you prefer to directly review within my repo feel free to close this pr again. I'm fine with whatever you prefer.
Not sure if all commits reported below are really visible from within this branch or whether the log below is an exceprt of the git reflog.

I also expcet that appveyor and travis to complain a bit due to the tox related changes would be surprised if things would work on the first try. Anyway i cross fingers

Reason for python 3.5 failure known (cant test here any more lacking Python3.5). Fixing if support of Python 3.5 should be kept beyond hickle 4. Is easy just one change in one line.

Fixing fail in Pyhton 3.5 is one change in one line will do if support for Python3.5, which i have no means any more to test locally here) shall be supported beyond hickle 4. No problem at all.

With the astropy warnings i would need your help as we do not use Astropy here so i have not any clue how to fix.

With hickle 4.0.0 the code for dumping and loading dedicated objects like scalar values or numpy arrays was moved to dedicated loader modules. This first step of disentangling hickle core machinery from object specific included all objects and structures which were mappable to h5py.Dataset objects. This commit provides an implementaition of hickle extension proposal H4EP001 (telegraphic#135). In this proposal the extension of the loader concept introduced by hickle 4.0.0 towards generic PyContainer based and mixed loaders specified. In addition to the proposed extension this proposed implementation inludes the following extensions hickle 4.0.0 and H4EP001 H4EP001: ======== PyContainer Interface includes a filter method which allows loaders when data is loaded to adjust, suppress, or insert addtional data subitems of h5py.Group objects. In order to acomplish the temorary modification of h5py.Group and h5py.Dataset object when file is opened in read only mode the H5NodeFilterProxy class is provided. This class will store all temporary modifications while the original h5py.Group and h5py.Dataset object stay unchanged hickle 4.0.0 / 4.0.1: ===================== Strings and arrays of bytes are stored as Python bytearrays and not as variable sized stirngs and bytes. The benefit is that hdf5 filters and hdf5.compression filters can be applied to Python bytearrays. The down is that data is stored as bytes of int8 datatype. This change affects native Python string scalars as well as numpy arrays containing strings. numpy.masked array is now stored as h5py.Group containin a dedicated dataset for data and mask each. scipy.sparce matrices now are stored as h5py.Group with containing the datasets data, indices, indptr and shape dictionary keys are now used as names for h5py.Dataset and h5py.Group objects. Only string, bytes, int, float, complex, bool and NonType keys are converted to name strings, for all other keys a key-value-pair group is created containg the key and value as its subitems. string and bytes keys which contain slashes are converted into key value pairs instead of converting slashes to backslashes. Distinction between hickle 4.0.0 string and byte keys with converted slashes is made by enclosing sting value within double quotes instead of single qoutes as donw by Python repr function or !r or %r string format specifiers. Consequently on load all string keys which are enclosed in single quotes will be subjected to slash conversion while any others will be used as ar. h5py.Group and h5py.Dataset objects the 'base_type' rerfers to 'pickle' are on load automatically get assigned object as their py_object_type. The related 'type' attribute is ignored. h5py.Dataset objects which do not expose a 'base_type' attribute are assumed to contain pickle string and thus get implicitly assigned 'pickle' base type. Thus on dump for all h5py.Dataset objects which contain pickle strings 'base_type' and 'type' attributes are ommited as their values are 'pickle' and object respective. Other stuff: ============ Full separation between hickle core and loaders Distinct unit tests for individual loaders and hickle core Cleanup of not any more required functions and classes Simplification of recursion on dump and load through self contained loader interface. is capbable to load hickle 4.0.x files which do not yet support PyContainer concept beyond list, tuple, dict and set includes extended test of loading hickel 4.0.x files contains fix for labda py_obj_type issue on numpy arrays with single non list/tuple object content. Python 3.8 refuses to unpickle lambda function string. Was observerd during finalizing pullrequest. Fixes are only activated when 4.0.x file is to be loaded Exceptoin thrown by load now includes exception triggering it including stacktrace for better localization of error in debuggin and error reporting.

1313e · 2021-01-19T00:23:38Z

What is commit d1760ee doing in here?
That is a commit from over 2 years ago and has no relevance at all to the current state of hickle.

hernot · 2021-01-19T00:28:39Z

Shit definitly some rebase error not sure how to get it out again. Shouldnt be there. Too late for today.

1313e · 2021-01-19T00:30:38Z

Also, given that this would be hickle v5, it is not necessary that the base code can handle v4 files.
Support for v4 files can be added in the same way as v4 currently handles v3 files.

hernot · 2021-01-19T20:12:59Z

it does, and differences are rather small that they can be easily be handled by stacking the few specialized loader functions on top of the default loader functions, like the compact expand which is also implemented as optional loader which on request is stacked on top of all others. Further i do fear that the the issue i observed only when running tests in python 3.6 and 3.7 in connection with h5py3 would not vanish with pulling in hickle 4 files ad dedicate legacy loading. And this way users who still have not yet had time to convert their files from hickle 3 to 4 still can do this using matured hickle 5. Thus unless severe ostacles and issues are against i would prefer to keep support as implemented ;-)

1313e · 2021-01-20T01:33:11Z

It is not like v5 will release any time soon, as many, many changes have to be made to this before it can be merged into master.

hernot · 2021-01-20T05:50:53Z

didn't expect to be, and as long as I'm able to keep in sync which your changes on dev it should be quite easy for me to take care of the amendments necessary to reach a releaseable state. Just let me know where you have your doubts, where you think improvement is necessary or where things have to be pushed further than done. I'm open to any input and hints. the only thing I want to avoid that in any cases sooner or later there exist two incompatible versions of hickle, the official one and mine even if i start to use mine for now all findings will be reported here and go back into branch .

telegraphic · 2021-01-27T02:22:57Z

Thanks @hernot, looking forward to this coming to fruition.

hernot · 2021-02-17T23:54:00Z

Looks like things change these days
Numpy 1.20 introduces changes which prevent nupy.dtype classess like numpy.int16 from beieng pickled and solution will not released before numpy 1.21 (detauls see commit message).
Latests hdf5.dll (1.12) for windows is only available in 64 bit no mor 32 bit which would be a plausible explanation why appveyor 32 runs from python 3.6 on fail while 64 bit runs finish properly. For python 3.5 it seems as if h5py would still recommend 1.10 or 1.8 for which 32 bit windows dlls are available.

Not sure which is better:

drop windows 32 bit support
add dedicated dependency line limiting h5py to 2.10 for windows 32bit while allowing all others to use recent h5py versoin

With hickle 4.0.0 the code for dumping and loading dedicated objects like scalar values or numpy arrays was moved to dedicated loader modules. This first step of disentangling hickle core machinery from object specific included all objects and structures which were mappable to h5py.Dataset objects. This commit provides an implementaition of hickle extension proposal H4EP001 (telegraphic#135). In this proposal the extension of the loader concept introduced by hickle 4.0.0 towards generic PyContainer based and mixed loaders specified. In addition to the proposed extension this proposed implementation inludes the following extensions hickle 4.0.0 and H4EP001 H4EP001: ======== PyContainer Interface includes a filter method which allows loaders when data is loaded to adjust, suppress, or insert addtional data subitems of h5py.Group objects. In order to acomplish the temorary modification of h5py.Group and h5py.Dataset object when file is opened in read only mode the H5NodeFilterProxy class is provided. This class will store all temporary modifications while the original h5py.Group and h5py.Dataset object stay unchanged hickle 4.0.0 / 4.0.1: ===================== Strings and arrays of bytes are stored as Python bytearrays and not as variable sized stirngs and bytes. The benefit is that hdf5 filters and hdf5.compression filters can be applied to Python bytearrays. The down is that data is stored as bytes of int8 datatype. This change affects native Python string scalars as well as numpy arrays containing strings. numpy.masked array is now stored as h5py.Group containin a dedicated dataset for data and mask each. scipy.sparce matrices now are stored as h5py.Group with containing the datasets data, indices, indptr and shape dictionary keys are now used as names for h5py.Dataset and h5py.Group objects. Only string, bytes, int, float, complex, bool and NonType keys are converted to name strings, for all other keys a key-value-pair group is created containg the key and value as its subitems. string and bytes keys which contain slashes are converted into key value pairs instead of converting slashes to backslashes. Distinction between hickle 4.0.0 string and byte keys with converted slashes is made by enclosing sting value within double quotes instead of single qoutes as donw by Python repr function or !r or %r string format specifiers. Consequently on load all string keys which are enclosed in single quotes will be subjected to slash conversion while any others will be used as ar. h5py.Group and h5py.Dataset objects the 'base_type' rerfers to 'pickle' are on load automatically get assigned object as their py_object_type. The related 'type' attribute is ignored. h5py.Dataset objects which do not expose a 'base_type' attribute are assumed to contain pickle string and thus get implicitly assigned 'pickle' base type. Thus on dump for all h5py.Dataset objects which contain pickle strings 'base_type' and 'type' attributes are ommited as their values are 'pickle' and object respective. Other stuff: ============ Full separation between hickle core and loaders Distinct unit tests for individual loaders and hickle core Cleanup of not any more required functions and classes Simplification of recursion on dump and load through self contained loader interface. is capbable to load hickle 4.0.x files which do not yet support PyContainer concept beyond list, tuple, dict and set includes extended test of loading hickel 4.0.x files contains fix for labda py_obj_type issue on numpy arrays with single non list/tuple object content. Python 3.8 refuses to unpickle lambda function string. Was observerd during finalizing pullrequest. Fixes are only activated when 4.0.x file is to be loaded Exceptoin thrown by load now includes exception triggering it including stacktrace for better localization of error in debuggin and error reporting. h5py version limited to <3.x according to issue telegraphic#143

…and type) Basic Menoisation: ================== Both types of memoisation are handled by Reference manager dictionary type object. For storing object instance references it is used as python dict object which stores the references to the py_obj and related node using py_obj(id) as key when dumping. On load the id of the h_node is used as key for storing the to be shared reference of the restored object. Additoinal references to the same object are represented by h5py.Datasets with their dtype set to ref_dtype. They are created by assinging a h5py.Refence object as returned by h5py.Dataset.ref or h5py.Group.ref attribute. These datasets are resolved by the filter iterator method of the ExpandReferenceContainer class and returned as sub_item of the reference dataset. Type Memoisation ================ The 'type' attribute of all nodes exempt datasets which contain pickle strings, or expose a ref_dtype as their dtype now contains a reference to the approriate py_obj_type entry in the global 'hickle_types_table' this table host the datasets representing all py_obj_types and base_types encountered by hickle.dump once. Each py_obj_type is represened by a numbered dataset containing the corresponding pickle string. The base_types are represented by empty datasets the name of which is the name of the base_type as defined by class_register table of loaders. No entry is stored for object, b'pickle' as well as hickle.RerfenceType,b'!node-refence' as these can be resolved implicitly on load. The 'base_type' attribute of a py_obj_type entry referres to the base_type used to encode and required to properly restore it again from the hickle file. The entries in the 'hickle_types_table' are managed by the ReferenceManager.store_type and ReferenceManager.resolve_type methods. The latter is also taking care of properly distinguishing pickle datasets from reference datasets and resolving hickle 4.0.X dict_item groups. The ReferenceManager is implemented as context manager and thus can and shall be used within the with statement, to ensure proper cleanup. Each file has its own ReferenceManager instance, therefore different data can be dumped to distinct files which are open in parallel. The basic management of managers is provided by the BaseManager base class which can be used build futher managers for example to allow loaders to be activated only when specific feature flags are passed to hickle.dump method or encountered by hickle.load from the file attributes. The BaseManager class has to be subclassed. Other changes: ============== - lookup.register_class and class_register tables have an addtional memoise flag indicating whether py_obj shall be remmembered for representing and resolving multiple references to it or if it shall be dumped and restored everytime it is encountered - lookup.hickle_types table entries include the memoise flag as third entry - lookup.load_loader: the tuple returned in addtion to py_obj_type includes the memoise flag - hickle.load: Whether to use load_fn stored in lookup.hkl_types_table or use a PyContainer object stored in hkl_container_dict is decided upon the is_container flag returned by ReferenceManager.resolve_type instead of checking whether processed node is of type h5py.Group - dtype of string and bytes datasets is now set to 'S1' instead of 'u8' and shape is set to (1,len)

hernot · 2021-02-24T05:36:45Z

Just to let you know currently amending commit 940710c to finally get rid of copy protocol mimicry as it turns out that it is not necessary to achieve what is intended by H4EP003 #145 (see latests edits of #145). As soon as finished with amendment i will take care of the rest for which i would apriciate an advice whether to issue limit versions on win32 or whether to drop win32 support on proposed hickle 5 release candidate.

1313e · 2021-02-24T07:57:56Z

As 32-bit Windows versions are still widely used, I think support for them should not be dropped.
However, if h5py no longer supports it, we might have to drop it as well.

hernot · 2021-03-01T22:20:20Z

Back to the surface.
Ammendment done
Python copy protocol mimickry (__compact__ and __expand__) gone
custom package/module and program loaders enabled:
- auto loading of dump_fcn, load_fcn and PyContainer (details see commit message implementing H4EP003)
- Special loader for recovering data where package/module is not installed and thus loader missing

Will check how win32 issue can be solved with least effort and maximum gain

@hernot

…ives In current release custom objects are simply converted into binary pickle string. This commit implements HEP003 issue telegraphic#145 register_class based alternative featureing custom loader modules. In hickle 4.x custom loader funcitons can be added to hickle by exiplicitly calling hickle.lookup.register_class before calling hickle.dump and hickle.load. In HEP003 an alternative approach, the hickle specific compact_expand protocol mimicking python copy protocol is proposed. It was found that this mimickry does not provide any benfefit compared to hickle.lookup.register_class based approach. Even worse hickle users have to litter their class definitions whith hickle only loader methods called __compact__ and __expand__ and in addition have to register their class for compact expand and activate the compact_expand loader option to activate the uses of these two methods. This commit implements hickle.lookup.register_class package and program loader modules support. In addtion load_<package>.py modules may in addtion to hickle.loaders directory also be stored along with the python base package, module or program main script. In order to keep directory structure clean load_<package>.py modules must be stored within the special hickle_loaders subdirectory at that level. Example package loader: ----------------------- dist_packages/ +- ... +- <my_package>/ | +- __init__.py | +- sub_module1.py | +- ... | +- sub_moduleN.py | +- hickle_loaders/ | +- load_<my_package>.py +- ... Example single module loader distpackages/ +- ... +- <my_module>.py +- ... +- hickle_loaders/ | +- ... | +- load_<my_module>.py | +- ... +- ... Example program main (package) loader bin/ +- ... +- <my_single_file_program>.py +- hickle_loaders/ | +- ... | +- load_<my_single_file_program>.py | +- ... +- ... +- <my_program>/ | +- ... | +- <main>.py | +- ... | +- hickle_loaders/ | | +- ... | | +- load_<main>.py | | +- ... | +- ... +- ... Fallback Loader recovering data: -------------------------------- Implements special AttemptsRecoverCustom types used in replacement for Python objects are missing or are incompatible to the data stored. The affected data is loaded as RecoveredGroup (dict type) or RecoveredDataset (numpy.ndarray type) objects. Attached to either is the attrs attribute as found on the corresponding h5py.Group and h5py.Dataset in the hickle file. LoaderManager: ============== The LoaderManager based approach allows to add further optional loader sets. For example when loading a hickle 4.0.X file imlicitly the corresponding loader set is added to ensure 'DictItem' and other helper types specific to hickle 4.0.X are properly recognized and the correpsonding data is properly restored. Only optional loaders exempt legacy loaders provided by hickle core (currently 'hickle-4.0') are considered valid which are listed by the 'optional_loaders' exported by hickle.loaders.__init__.py. A class_register table entry can be assigned to a specific optional loader by specifying the loader name as its 7th item. Any other entry which has less than 7 items or its 7th item reads None is included in the set of global loaders. @hernot

Hickle 4.0.x accepts file like objects and uses them as model for creating a new file discarding file handler passed in. If the file should be accessed later on using the same file handler this results in an IOError raised and alike. The proposed fix does not close the file but checks that it is opened at least read able if mode is 'r' or read and writable in case mode reads 'w', 'r+', 'w+','x', 'x+' or 'a'. Note: passed in file or file like object has to be closed by caller.

Compression keywords safety tests: ================================== In issue telegraphic#140 it was reported that some loaders crash when 'compression', 'chunks' and related h5py keyword arguments are specified. By running pytest a second time and thereby specifying the custom --enable-compression parameter all tests are rerun with kwargs={'compression':'gzip', 'compression_opts':6} All compression sensitive tests especially all 'test_XX_*.py::*' unit test functions must include the 'compression_kwargs' parameter in their signature to receive the actual kwargs list to be passed to all 'create_fcn' function defined by loader module. In case a test function misses to pass on the 'compression_kwargs' as keyword arguments ('**kwargs') to 'hickle.dump', 'hickle._dump', or any dump_method listed in 'class_register' table of loader module or specified directly in a 'LoaderManager.register_class' an AssertionError exception is thrown indicating the name of the test function, the line in which the affected function is called any function which it calls. Tests which either test compression related issues explicitly or do not call any of the dump_functions may be marked accordingly using the 'pytest.mark.no_compression' marker to explicitly exclude test function from compression testing. Tox virtual env manager support: ================================ Adds support for virtualenv manager tox. Tox simplifies local testing of compatibility for multiple python versions before pushing to github and creating pullrequest. Travis and Appveyor integration still has to be tested and verified. '# Sie sind gerade beim Rebase von Branch 'final_and_cleanup_4.1.0' auf 'ab9a0ee'.

Current release causes problems when trying to dump and load data when h5py 3.X library and newer is installed. In that case loading will fail with errors. This is caused by - a change in how strings are stored in Attributes. Before 3.x they were stored as bytes strings. Encoding had to happend manually before storing to and after reading from the attribute. In h5py encoding is handled by attrs.__getitem__ and attrs.__getitem__. The first simply assumes that any bytestring stored as attribute value must represent a python 'str' object which either was written by python 2 which used plain c strings or is the encoded version of a Python 3 utf8 string and thus has to be encoded in utf8. Only in case the byte string contains characters which are not valid in utf8 encoded string than content is returned as 'bytes' object - a change in how utf8 'str' objets are stored. They are now consequently stored as var_dtype objects where the metadata of dtype indicate the resulting python type. In H5py2 this didn't matter therefore strings stored in datasets using h5py2 are returned with a opaque object dtype for which decoding to utf8 has to be enforced if py_obj_type as indicated by 'type' attribute is 'str'. - a change how Datasets for which no explicit link exists in the file but a strong Python reference exists to the h5py.Dataset or h5py.Group objects exits are handled when a reference to them is resolved. In h5py 2 a ValueError is raise in h5py 3.X an anonymous Dataset or Group is returned. Consequently when simulating during test runs stale references encountered by hickle.load all python references to the corresponding datasets and groups have to be dropped to see that the reference is stale. The alternative would be to check the name of the referred to Dataset if it is None the dataset would be anonymous. Resolving the name on non Anonymous Datasets and Groups, which allways represents the full path name and not just the local name, is a quite expensive and costy processin terms of runtime and resources and shall be avoided This commit provides fixes for these changes severely affecting proper operation of hickle. String values of attributes are compared against 'bytes' and 'str' type variants of constant strings. Lookupables like key_base_type and hkl_types and hkl_conainer_types include the key for an entry as 'str' and 'bytes' type key. A featured optional load_fcn is provided fixing the difference how dtype of utf8 strings is computed by h5py 2 and h5py 3. Test functions are updated accordingly to properly mock stale reference conditions and changes in how strings are stored.

hernot · 2021-10-17T12:39:26Z

Ok fixes done. Numpy depencency still not changed cause a bit affected by the issue current dill versions have with dumping type(numpy(>=1.21).dtype required for anotating numpy.dtype with their proper py_object_type. Future dill release will have a dedicated fix for it. But in general this will occur for any project which like numpy uses dynamic classes and alike. And it is only to be expected that dill project will add further workarounds and fixes for popular packages like numpy. But not smaller projects or personal projects. So the only solution in the current state is either to combine lastest numpy and lastes dill including the fix or manually resort to numpy <1.20. Both solutions are quite limiting and reduces the usability of hickle.

After testing the following options feasible and reasonable for hickle are possible. Need to know which route to go:

keep requirement as is until dill officially has released fix and dtype test does not fail any more. This after updating requirements needs a clear knowledge which dill/numpy combinations work and as stated above, and can be quite limitting to usage of hickle.
change requirement to only skip numpy 1.20.X versions and disable dtype test. This needs a clear indication in the README that currently no dtype objects can be dumped with hickle when using numpy >= 1.21.
drop dill and resort to vanilly python pickle. This needs further testing and verification by you two, @telegraphic and @1313e, to ensure that no unexpected issues occur. For this I have prepared in my forked hickle repo the dedicated testing and playing only dill-or-not-dill branch.

NOTE
windows runners on GitHub actions seem not to expose the most consistent configuration. Some of them seem to be configured to globally allow read access to files opened for writing while others prohibit this causing the error described in log message of latests not planned commit. Just stumbled across when preparing the dill-or-not-dill within my forked repo. I wanted to spare you the despair i faced thereby until accepting what my gut suspected from the start.

EDIT
Found another bug breaking package, module and main script loader module support. I fear i deleted it accidentially, when cleaning coverage flagging # pragma: nocover comments. Luckily i found it. Let me know if i should squash all the small fixes into one single commit or keep them as they are.

… adding dill weed

…writing

…cleanup. Without package, module and script loader support is broken

…properly and stable under any circumstances?

… collector, try forcing collection after file flushing and before closing

…ves the problem too

On GitHub actions windows runners it was obswerved for python 3.7 32 bit and 64 bit and python 3.8 64 bit that h5py.copy aborted with PermissionError exceptoin when copying hdf5 files which are opened for writing. On the remaining three windows runners as well as linux and mac runners copying was possible without any issue. Adopted test_ReferenceManager_context and test_ReferenceManager tests to close file accessed through h5_data fixture first before calling shutil.copy. Fixed spourious Unrecognized typecode ValueError exception occuring during h5py.File.close. These are caused by python garbage collector which kicks in while h5py.close in critical section. By disabling garbage collection before call to h5py.File.close and re-enabling afterwards this can be solved for all python versions. This error most consistently could be observed on windows 32 bit running python 3.7 with h5py 2.10.0.

…cleanup. Without package, module and script loader support is broken

telegraphic · 2021-10-19T07:14:43Z

Hi @hernot -- catching up on this now. I think a larger issue is that dill shouldn't be pickling numpy dtypes!

Numpy dtypes should be handled by create_np_dtype. Any idea why this isn't happening?

hernot · 2021-10-20T07:36:58Z

@telegraphic
It is happening, the create_np_dtype is working properly. The problem is the creation of the py_obj_type attribute string. Which is a pickle string of for exampel np.dtype(int).__class__ that works flawwlessly with curren dill until numpy <1.20. With numpy >= 1.20 the numpy project has changed a few things including that they now use dynamic class objects for encoding the class for each alife dtype.

If you look at type.mro(np.dtype(int).__class__ it looks like the following in numpy >= 1.20: [ <class np.dtype[int]> <class np.dtype> <class object>] as you can see the actual class has a reprstring and a __qualname__which is not a vaild python expression. When this change was first published with numpy 1.20.X this caused a problem with standard pickle too. But in numpy >= 1.21 they have added a copyreg.dispatch_table entry for type(np.dtype) type objects which handles the dumping of the new style np.dtype objects.

Dill does not make use of this table so it is not able to properly handle this case. I had last weekend a discussion with numpy and dill project with the result that the dill project has added a workaround for exactly this case which will be available in the next release. Checking the workaround it is not the only one the dill project uses to handle special cases like the new way numpy encodes the classes of its dtype objects.

In the light of the now official possibility to add custom loaders to hickle outside official hickle/loaders directory, it is to be expected that similar issues problem will occur more frequently with custom classes from users who report them as issue on the hickle project. Given the fact that the dill project has reasons why they still stick to python 2 style pickler and not utilize copyreg and other extensions introduced by python3, it is to be expected that there will only be dill side workarounds for issues related to projects being larger than dill project.

This on the other hand will sooner or later limit the outreach and use-cases of hickle, which, as far as i understand, was and is thought by you @telegraphic and @1313e as compatible standard pickle as possible and feasible. Therefore users have the ability to add their own loaders for their very local and specific classes and cases. If they are not picklable by pickle than hickle should not handle them either. But if they are properly handled by standard pickle i would expect that hickle does too and would be surprised that it doesn't just cause dill refuses to pickle the class object of my fancy object.

Hope i could shed some light on the core of the issue and what breaks when changing requirements for numpy form <1.20 to != 1.20 to include numpy versions >= 1.21

1313e · 2021-10-20T08:25:55Z

I think with all that, we should drop dill and go back to pickle again.
I am somewhat surprised that dill does not use copyreg, because I think it should use that, but if they don't, then there is no reason to use dill anymore.
After all, copyreg is how you add more objects to pickle.
@telegraphic Thoughts?

telegraphic · 2021-10-20T09:04:09Z

Thanks @hernot for the explanation! If pickle handles this correctly, I'm happy to switch to it as it is part of Python core. This major v5 release is the optimal time to make the change too!

I think pickling with protocol=4 is a good option? (https://www.python.org/dev/peps/pep-3154/) (Assuming this plays nicely with numpy @hernot?)

hernot · 2021-10-20T13:56:12Z

@telegraphic Sorry misread your questions first. Normally pickle anyway will select the highest version available in python version.

In Python 3.5, 3.6 and 3.7 the default is 3 and the highest is 4 whereas from 8 on the default is 4 and the highest is 5.
Therefore yes up to date there will be no problem reading files generated with any of the python versions currently supported by hickle. And from documentation it seems that this will not change so soon at least not for python 3.9, 3.10 and the forthcoming python 3.11

…irements updated to allow any time removal from requirements.txt and requirements32.txt while it stays requirement for testing. Removed explicit six requirement as no loader yet available and likely meanwhile installed by other packages anyway in appropriate version.

telegraphic · 2021-12-13T03:28:57Z

@1313e and @hernot -- where are we on this? Ready to pull the trigger?

1313e · 2021-12-13T08:13:24Z

Yeah, I am fine with this merging.

telegraphic · 2021-12-13T09:23:55Z

@hernot double checking you're done with the dill cleanout and ready for the big merge?

hernot · 2021-12-13T10:42:36Z

yes I'm done

hickle 5. Files are dumped without dill
legacy loading of hickle <= 4.x files still needs dill.
therefore dill is still in the requirements for legacy loading but indicated that only need for legacy loading in README.md
can be removed any time from requrements.txt but must stay in requirements_test.txt as long as there is the need for legacy loading hickle <= 4.x Files

1313e self-requested a review January 19, 2021 00:15

hernot force-pushed the hickle-5-RC branch from 0c344ca to b2b7b07 Compare January 19, 2021 00:21

hernot mentioned this pull request Jan 24, 2021

HEP004: Improve support for discipline-specific classes by extras_require in setup.py #148

Closed

This was referenced Feb 13, 2021

Safe loading does not work, & custom object support #150

Closed

HEP003: Hickle Compact Expand protocol #145

Closed

hernot force-pushed the hickle-5-RC branch from b2b7b07 to ed520a5 Compare February 17, 2021 21:31

hernot force-pushed the hickle-5-RC branch from 541598a to 8142ff5 Compare February 18, 2021 06:21

hernot changed the base branch from dev to v3 February 19, 2021 19:15

hernot changed the base branch from v3 to dev February 19, 2021 19:16

hernot added 2 commits February 19, 2021 20:38

hernot force-pushed the hickle-5-RC branch from 4d44ff3 to 6178f21 Compare February 19, 2021 19:47

hernot force-pushed the hickle-5-RC branch from 6178f21 to d136802 Compare March 1, 2021 22:15

hernot added 5 commits March 2, 2021 07:12

stale refdataset fix on h5py3 and fix of spourious race

9b905f4

hernot added 14 commits October 17, 2021 14:41

unlocked workflow for local execution to identify bad smells when not…

54ad890

… adding dill weed

one more

1654fac

is it powershell?

ea239f5

is it shutil copyfile without metadata?

706ef2f

does it work with plain shutil copy

2524c53

is it just a damn race?

d475dca

i fear it is what i already expected damn windows exclusive open for …

9f83e19

…writing

readded seential line accidentially deleted during coverage flagging …

e4d9cf9

…cleanup. Without package, module and script loader support is broken

does h5py 2.10.0 need a flush before closing on larger files to work …

0cd93f8

…properly and stable under any circumstances?

does h5py 2.10.0 need a flush before closing on larger files to work …

9af6ea4

…properly and stable under any circumstances?

is it as a comment for file close test in h5py suite says the garbage…

e69dce0

… collector, try forcing collection after file flushing and before closing

it is the garbage collector, check if simply dis- and re-enabling sol…

7943303

…ves the problem too

readded seential line accidentially deleted during coverage flagging …

c8d08ec

…cleanup. Without package, module and script loader support is broken

hernot force-pushed the hickle-5-RC branch from e42904e to c8d08ec Compare October 17, 2021 21:04

hernot added 2 commits October 20, 2021 19:14

telegraphic mentioned this pull request Dec 13, 2021

TypeError: string argument without an encoding #152

Closed

telegraphic merged commit 9bedbec into telegraphic:dev-v5 Dec 16, 2021

hernot mentioned this pull request Jan 16, 2022

Problems with handling kwargs #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hickle 5 rc #149

Hickle 5 rc #149

hernot commented Jan 18, 2021 •

edited

Loading

1313e commented Jan 19, 2021

hernot commented Jan 19, 2021

1313e commented Jan 19, 2021

hernot commented Jan 19, 2021

1313e commented Jan 20, 2021

hernot commented Jan 20, 2021

telegraphic commented Jan 27, 2021

hernot commented Feb 17, 2021 •

edited

Loading

hernot commented Feb 24, 2021 •

edited

Loading

1313e commented Feb 24, 2021 •

edited

Loading

hernot commented Mar 1, 2021 •

edited

Loading

hernot commented Oct 17, 2021 •

edited

Loading

telegraphic commented Oct 19, 2021

hernot commented Oct 20, 2021 •

edited

Loading

1313e commented Oct 20, 2021

telegraphic commented Oct 20, 2021

hernot commented Oct 20, 2021 •

edited

Loading

telegraphic commented Dec 13, 2021

1313e commented Dec 13, 2021

telegraphic commented Dec 13, 2021

hernot commented Dec 13, 2021

Hickle 5 rc #149

Hickle 5 rc #149

Conversation

hernot commented Jan 18, 2021 • edited Loading

1313e commented Jan 19, 2021

hernot commented Jan 19, 2021

1313e commented Jan 19, 2021

hernot commented Jan 19, 2021

1313e commented Jan 20, 2021

hernot commented Jan 20, 2021

telegraphic commented Jan 27, 2021

hernot commented Feb 17, 2021 • edited Loading

hernot commented Feb 24, 2021 • edited Loading

1313e commented Feb 24, 2021 • edited Loading

hernot commented Mar 1, 2021 • edited Loading

hernot commented Oct 17, 2021 • edited Loading

telegraphic commented Oct 19, 2021

hernot commented Oct 20, 2021 • edited Loading

1313e commented Oct 20, 2021

telegraphic commented Oct 20, 2021

hernot commented Oct 20, 2021 • edited Loading

telegraphic commented Dec 13, 2021

1313e commented Dec 13, 2021

telegraphic commented Dec 13, 2021

hernot commented Dec 13, 2021

hernot commented Jan 18, 2021 •

edited

Loading

hernot commented Feb 17, 2021 •

edited

Loading

hernot commented Feb 24, 2021 •

edited

Loading

1313e commented Feb 24, 2021 •

edited

Loading

hernot commented Mar 1, 2021 •

edited

Loading

hernot commented Oct 17, 2021 •

edited

Loading

hernot commented Oct 20, 2021 •

edited

Loading

hernot commented Oct 20, 2021 •

edited

Loading