Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementaion of Container and mixed loaders (H4EP001) #138

Closed
wants to merge 3 commits into from

Conversation

hernot
Copy link
Contributor

@hernot hernot commented Jul 26, 2020

At first:

@1313e with this pull request i want to express how much i apriciate your really
great work done for hickle 4.0.0 implementing the first step to dedicated loaders.

Second: the reason why I'm so pushing upon implementation of H4EP001

The research conducted by the research group I'm establishing and leading
is split into two tracks. A methodological one dealing with improvement and
development of new algorithms and methods for clinical procedures in diagnostics
and treatment. The second one is concerned with clinical research utilizing the
tools based upon the methods and algorithms provided by the first track.

In the first track python, numpy, scipy etc. are the primary tools for working on
the algorithms, investigating new procedures and algorithmic approaches.
The work in the second track is primarily conducted by clinicians. Therefore
the tools provided for their research and studies have to be thoroughly tested
and validated. This validation at least the part which can be automatized through
unit test utilizes test data, including intermediate data and results obtained and
provided by the python programs and scripts developed during development of
underlying algorithm.

As the clinical tools are implemented in compiled languages
which support true multi-threading the data passed on has to be stored in a
file format readable outside python, out-ruling pickle strings. Therefore jsonpickle
was used to dump the data. Meanwhile the amount of data has grown into the large
so that json files even if compressed using zip, gzip or other compression schemes
is not feasible any more. NPY, and NPZ files which was the next choice mandate a
dependency upon numpy library. Just for conducting unit tests a self contained file
format for which only the corresponding library without any further has to be included
would be the better choice.

And this is the point where hdf5libraries and hickle come into play. I do consider both
as the best and most suitable option i have found so far. And the current limitation that
objects without dedicated loader are stored as pickle strings can be solved by
supporting python copy protocol. Which i offer hereby to contribute to hickle.

Third content of this pull-request:

Implementation of Container based and mixed loaders as proposed by #135 hickle extension
proposal H4EP001. For details see commit-message and the proposal #135.

Finally i do recommend:

Not to put this into an official release. Some extended tests using a real dataset compiled
for testing and validating software tools and components developed for use in clinical track
showed that an important part is missing to keep file sizes at reasonable level. Especially
type-strings and pickle strings for class and function objects currently take-up most of the
file space letting dumped files grow quickly into GB even with hdf5 file compression activated
where the pickle stream just requires 400MB of space.
Therefore i do recommend to implement additionally memoization (H4EP002 #139 ) first before considering
the resulting code base ready for release.

Ps.: loading of hickle 4.0.0 files should be still possible out of the box. Due to the lack of an appropirate testfile no test is included to verify.

@hernot hernot changed the base branch from master to dev July 26, 2020 15:25
@codecov
Copy link

codecov bot commented Jul 26, 2020

Codecov Report

Merging #138 (d4ef711) into dev (3b5efbb) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##               dev      #138   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            9         9           
  Lines          592       654   +62     
=========================================
+ Hits           592       654   +62     
Impacted Files Coverage Δ
hickle/__version__.py 100.00% <100.00%> (ø)
hickle/helpers.py 100.00% <100.00%> (ø)
hickle/hickle.py 100.00% <100.00%> (ø)
hickle/loaders/load_astropy.py 100.00% <100.00%> (ø)
hickle/loaders/load_builtins.py 100.00% <100.00%> (ø)
hickle/loaders/load_numpy.py 100.00% <100.00%> (ø)
hickle/loaders/load_scipy.py 100.00% <100.00%> (ø)
hickle/lookup.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b5efbb...d4ef711. Read the comment docs.

@telegraphic
Copy link
Owner

I'll wait for some input from @1313e, I only note two small things:

  • code coverage (only one line not covered so seems reasonable to push it back to 100%!). L349 in hickle.py is not covered:
        raise FileError("HDF5-file does not have the proper attributes!")
  • As you noted, we need an appropriate test file for v4.0.0

@hernot
Copy link
Contributor Author

hernot commented Jul 28, 2020

I'm thinking of trying to get hands on this two issues the next days while we are waiting for input for @1313e . Not sure if i will manage or fail.

Another thing i have checked the two other pullrequest #136 and #137. They are basically the same two minor adaptions of load_astropy.py loader and test_astropy.py test. In case you and @1313e agree i would also include in these efforts to implicitly include the ammendments of the two files. Especially when adding anyway a test for verifying that loading of 4.0.0 files is not broken by the proposed changes. What do you think.

@1313e
Copy link
Collaborator

1313e commented Jul 28, 2020

#136 and #137 exist, because I specifically asked for 2 PRs (I still have to merge them).
They will come before this one gets merged, so some additional merging will be required.

I have just finished an incredibly busy period, so I now finally have time to go and take a look at this.
I will hopefully be able to do so this week.

@hernot
Copy link
Contributor Author

hernot commented Jul 28, 2020

@1313e: So shall i wait with the ammendments for the not covered line and the implementation of the not yet included verification for proper loading of 4.0.0 file too or would you prefer that these ammendments are done before reviewing.

@1313e
Copy link
Collaborator

1313e commented Jul 28, 2020

You can add them just fine, as long as they have nothing to do with #136 and #137, as you will require merging the master branch into your branch before this PR can be accepted.

@1313e
Copy link
Collaborator

1313e commented Jul 28, 2020

There you go, now those PRs are merged.
I see that there are some conflicts, so they will have to be resolved first.

@hernot
Copy link
Contributor Author

hernot commented Jul 28, 2020

I do propose the following approach

  1. I will first fix the not covered line and come up with an appropriate test for testing proper loading of 4.0.0 files.
  2. If 1 is stable try to rebase and resolve the conflicts that pr is again non conflicting to dev.

Anything I'm up to missing anything and thus should consider and cover additionally?

@1313e
Copy link
Collaborator

1313e commented Jul 28, 2020

You first want to merge master or dev (both are the same) into your branch, and then fix everything else.

@hernot hernot force-pushed the mixed_container branch from 0b4baac to c53e859 Compare July 28, 2020 18:54
@hernot
Copy link
Contributor Author

hernot commented Jul 28, 2020

Ok think rebase worked. @1313e thank you very much for reminding.

@hernot hernot force-pushed the mixed_container branch from c53e859 to a638518 Compare July 29, 2020 20:37
@hernot
Copy link
Contributor Author

hernot commented Jul 29, 2020

Think i managed. Full coverage and proper loading of hickle 4.0.0 and 4.0.1 files. Added file created with master and pickled version of the same data which test uses to verify that data were properly restored. Hope i did not miss any tricky obstacle. Real world use will reveal.
Fuck numpy and pickle as allways and why 3.8 breaks i have no idea as in 3.6 and 3.7 it works. for today its already too late to solve. dropping the pickle file could be possible as the script creating it is also there but. the error in 3.8 sigh.

@1313e
Copy link
Collaborator

1313e commented Jul 31, 2020

@hernot Having fun over there? ;)

@hernot
Copy link
Contributor Author

hernot commented Jul 31, 2020

not any more one last squashing and things are ready, just hat to crawl down a tight rabit hole to find the one nasty error occuring in python 3.8. And as i do not have here a native Python 3.8 installation here available i have to abuse Tarvis a bit for this.

The error seemed to be caused by the fact that load_numpy.create_ndarray_dataset creates on lines 102-105 a 'type' string from lambda

            # If not, create a new group and dump py_obj into that
            d = h_group.create_group(name)
            _dump(py_obj, d, **kwargs)
            d.attrs['type'] = np.array(pickle.dumps(lambda x: np.array(x[0])))

which is not used any more in the new container based scheme. And further according to Python doc
Section What can be pickled and unpickled not supported and seems to be enforced now in python3.8 or is just in 3.8 unluckily triggered by the new test verifying that hickle 4.0.0 and 4.0.1 files are proper loaded without the need for an additional dedicated legacy loader.

I will cleanup the code and squash the intermediate commits and repush one final time when done.

@hernot hernot force-pushed the mixed_container branch from 90bb8e5 to 0cd486d Compare July 31, 2020 18:00
@hernot
Copy link
Contributor Author

hernot commented Jul 31, 2020

Ok final cleanup and squashing done.

NOTE:
For testing purpose and thus exeptional form usual release procedure I bumped the minor version number to 4.1.0. This allowed me to properly test that when loading files created by hickle 4.0.x the related fixes and workarounds needed are activated and only then. These are especially required when running under Python 3.8 as this seems to be especially more strict upon pickling and unpickling lambda functions compared to earlier versions. Even more when the to be unpickled functions have vanished from the module they were defined in earlier versions of hickle.
This does not in any case replace the decision upon final version number during release. In case pushing patch number instead of minor would then be preferred i provide any needed support in identifying which items would have to be amended to work properly.

phew
finally awaiting your review and your suggestions for improvement and amendment.

@1313e
Copy link
Collaborator

1313e commented Jul 31, 2020

Well, I better do some reviewing then huh?
Wonder when I will have the time to review something this big.

@hernot
Copy link
Contributor Author

hernot commented Aug 1, 2020

Take your time, the next steps will definitly be smaller than the switch to loader based design in 4.0.0 and the extesnsion of PyContainer concept initiated in 4.0.0 now. Further a big part is covered by the rather thorough (hopefully) unit tests of individual modules (helpers, lookup, load_builtins, load_numpy, load_scipy, load_astropy, hickle) in addition to tests already present. These are also the reason why i stumbled over the compression issue related to scalars and strings as reported in #140 and implemented a proposal to fix. And some other things which fitted for demonstration purpose (confess #125).
EDIT before final release together with memoization it is necessary to go through all loaders and check for strings and scalars which might otherwise be missed by the by then implemented compression handling fix for #140. I will open an appropriate issue timely when done with the rest.

@1313e 1313e self-requested a review September 1, 2020 01:28
@hernot hernot force-pushed the mixed_container branch 2 times, most recently from 2e46fec to 5d63b64 Compare September 4, 2020 14:04
@hernot
Copy link
Contributor Author

hernot commented Oct 1, 2020

Hi just a small ping on how the plans are.

@hernot hernot force-pushed the mixed_container branch 2 times, most recently from bf12c6a to e4bc715 Compare December 2, 2020 22:41
@hernot
Copy link
Contributor Author

hernot commented Dec 2, 2020

In the middle of rebasing and removing support for python copy protocol. One last step necessary to make tests succeed

  • limit requirements to h5py < 3.x as suggested by @1313e on issue #143. Will continue tomorrow today already too late.

@1313e
Copy link
Collaborator

1313e commented Dec 3, 2020

@telegraphic Thoughts?

@hernot
Copy link
Contributor Author

hernot commented Dec 3, 2020

Implementation for python copy protocol support now removed as indicated above and by the reason given why closing issue #125. A an idea how to reach the intended goal in hdf5 friendly manner i have sketched in HEP003 #145.

@hernot
Copy link
Contributor Author

hernot commented Dec 20, 2020

@telegraphic I know you are quite busy but never the less is it possible to timely decide on this as with each fix @1313e necessarily implements rebasing of this pull-request and all dependent branches for upcoming pull-requests becomes more complicated and tedious.

hernot and others added 3 commits January 5, 2021 21:16
With hickle 4.0.0 the code for dumping and loading dedicated objects
like scalar values or numpy arrays was moved to dedicated loader
modules. This first step of disentangling hickle core machinery from
object specific included all objects and structures which were mappable
to h5py.Dataset objects.

This commit provides an implementaition of hickle extension proposal
H4EP001 (telegraphic#135). In this
proposal the extension of the loader concept introduced by hickle 4.0.0
towards generic PyContainer based and mixed loaders specified.

In addition to the proposed extension this proposed implementation inludes
the following extensions hickle 4.0.0 and H4EP001

H4EP001:
========
    PyContainer Interface includes a filter method which allows loaders
    when data is loaded to adjust, suppress, or insert addtional data subitems
    of h5py.Group objects. In order to acomplish the temorary modification
    of h5py.Group and h5py.Dataset object when file is opened in read
    only mode the H5NodeFilterProxy class is provided. This class will
    store all temporary modifications while the original h5py.Group
    and h5py.Dataset object stay unchanged

hickle 4.0.0 / 4.0.1:
=====================
    Strings and arrays of bytes are stored as Python bytearrays and not as
    variable sized stirngs and bytes. The benefit is that hdf5 filters
    and hdf5.compression filters can be applied to Python bytearrays.
    The down is that data is stored as bytes of int8 datatype.
    This change affects native Python string scalars as well as numpy
    arrays containing strings.

    numpy.masked array is now stored as h5py.Group containin a dedicated
    dataset for data and mask each.

    scipy.sparce matrices now are stored as h5py.Group with containing
    the datasets data, indices, indptr and shape

    dictionary keys are now used as names for h5py.Dataset and
    h5py.Group objects.

    Only string, bytes, int, float, complex, bool and NonType keys are
    converted to name strings, for all other keys a key-value-pair group
    is created containg the key and value as its subitems.

    string and bytes keys which contain slashes are converted into key
    value pairs instead of converting slashes to backslashes.
    Distinction between hickle 4.0.0 string and byte keys with converted
    slashes is made by enclosing sting value within double quotes
    instead of single qoutes as donw by Python repr function or !r or %r
    string format specifiers. Consequently on load all string keys which
    are enclosed in single quotes will be subjected to slash conversion
    while any others will be used as ar.

    h5py.Group and h5py.Dataset objects the 'base_type' rerfers to 'pickle'
    are on load automatically get assigned object as their py_object_type.
    The related 'type' attribute is ignored. h5py.Dataset objects which do
    not expose a 'base_type' attribute are assumed to contain pickle string
    and thus get implicitly assigned 'pickle' base type. Thus on dump for all
    h5py.Dataset objects which contain pickle strings 'base_type' and 'type'
    attributes are ommited as their values are 'pickle' and object respective.

Other stuff:
============
    Full separation between hickle core and loaders

    Distinct unit tests for individual loaders and hickle core

    Cleanup of not any more required functions and classes

    Simplification of recursion on dump and load through self contained
    loader interface.

    is capbable to load hickle 4.0.x files which do not yet
    support PyContainer concept beyond list, tuple, dict and set
    includes extended test of loading hickel 4.0.x files

    contains fix for labda py_obj_type issue on numpy arrays with
    single non list/tuple object content. Python 3.8 refuses to
    unpickle lambda function string. Was observerd during finalizing
    pullrequest. Fixes are only activated when 4.0.x file is to be
    loaded

    Exceptoin thrown by load now includes exception triggering it
    including stacktrace for better localization of error in debuggin
    and error reporting.
Related to issue telegraphic#83, making astropy/scipy optional dependencies.
Can now install e.g. hickle[astropy] to add astropy support.
Uses pkg_resources.requires('hickle[astropy') to check and only
load if error is not raised.
With hickle 4.0.0 the code for dumping and loading dedicated objects
like scalar values or numpy arrays was moved to dedicated loader
modules. This first step of disentangling hickle core machinery from
object specific included all objects and structures which were mappable
to h5py.Dataset objects.

This commit provides an implementaition of hickle extension proposal
H4EP001 (telegraphic#135). In this
proposal the extension of the loader concept introduced by hickle 4.0.0
towards generic PyContainer based and mixed loaders specified.

In addition to the proposed extension this proposed implementation inludes
the following extensions hickle 4.0.0 and H4EP001

H4EP001:
========
    PyContainer Interface includes a filter method which allows loaders
    when data is loaded to adjust, suppress, or insert addtional data subitems
    of h5py.Group objects. In order to acomplish the temorary modification
    of h5py.Group and h5py.Dataset object when file is opened in read
    only mode the H5NodeFilterProxy class is provided. This class will
    store all temporary modifications while the original h5py.Group
    and h5py.Dataset object stay unchanged

hickle 4.0.0 / 4.0.1:
=====================
    Strings and arrays of bytes are stored as Python bytearrays and not as
    variable sized stirngs and bytes. The benefit is that hdf5 filters
    and hdf5.compression filters can be applied to Python bytearrays.
    The down is that data is stored as bytes of int8 datatype.
    This change affects native Python string scalars as well as numpy
    arrays containing strings.

    numpy.masked array is now stored as h5py.Group containin a dedicated
    dataset for data and mask each.

    scipy.sparce matrices now are stored as h5py.Group with containing
    the datasets data, indices, indptr and shape

    dictionary keys are now used as names for h5py.Dataset and
    h5py.Group objects.

    Only string, bytes, int, float, complex, bool and NonType keys are
    converted to name strings, for all other keys a key-value-pair group
    is created containg the key and value as its subitems.

    string and bytes keys which contain slashes are converted into key
    value pairs instead of converting slashes to backslashes.
    Distinction between hickle 4.0.0 string and byte keys with converted
    slashes is made by enclosing sting value within double quotes
    instead of single qoutes as donw by Python repr function or !r or %r
    string format specifiers. Consequently on load all string keys which
    are enclosed in single quotes will be subjected to slash conversion
    while any others will be used as ar.

    h5py.Group and h5py.Dataset objects the 'base_type' rerfers to 'pickle'
    are on load automatically get assigned object as their py_object_type.
    The related 'type' attribute is ignored. h5py.Dataset objects which do
    not expose a 'base_type' attribute are assumed to contain pickle string
    and thus get implicitly assigned 'pickle' base type. Thus on dump for all
    h5py.Dataset objects which contain pickle strings 'base_type' and 'type'
    attributes are ommited as their values are 'pickle' and object respective.

Other stuff:
============
    Full separation between hickle core and loaders

    Distinct unit tests for individual loaders and hickle core

    Cleanup of not any more required functions and classes

    Simplification of recursion on dump and load through self contained
    loader interface.

    is capbable to load hickle 4.0.x files which do not yet
    support PyContainer concept beyond list, tuple, dict and set
    includes extended test of loading hickel 4.0.x files

    contains fix for labda py_obj_type issue on numpy arrays with
    single non list/tuple object content. Python 3.8 refuses to
    unpickle lambda function string. Was observerd during finalizing
    pullrequest. Fixes are only activated when 4.0.x file is to be
    loaded

    Exceptoin thrown by load now includes exception triggering it
    including stacktrace for better localization of error in debuggin
    and error reporting.

    h5py version limited to <3.x according to issue telegraphic#143
@telegraphic
Copy link
Owner

Hi @hernot and @1313e -- I'm taking a dive into the PR this week. I'm currently on paternity leave so have been even slower to respond than usual (first child so tough learning curve!).

For now, let me say @hernot I really appreciate all the effort and thought you've clearly put into this. Apologies for my tardiness!

@hernot
Copy link
Contributor Author

hernot commented Jan 13, 2021

@telegraphic no worry, every thing is set and prepared here, so just waiting for your go.

@telegraphic
Copy link
Owner

Making some notes here as I go through:

Compare file structure 4.0.4 and 4.1.0

# Make a basic test file with 4.1.0 and 4.0.4
a = {'a': 1, 'b': 'hello', 'c': [1,2,3]}
b = np.array([1,2,3,5])
c = (a, b)
hkl.dump(c, 'hkl_410.hkl')

# (rinse and repeat for hkl_404.hkl with 4.0.4

Compare the two file structures:

 h5diff -v hkl_410.hkl hkl_404.hkl

file1     file2
---------------------------------------
    x      x    /
    x      x    /data
    x           /data/data0
    x           /data/data0/"a"
    x           /data/data0/"b"
    x           /data/data0/"c"
    x           /data/data1
           x    /data/data_0
           x    /data/data_0/'a'
           x    /data/data_0/'a'/data
           x    /data/data_0/'b'
           x    /data/data_0/'b'/data
           x    /data/data_0/'c'
           x    /data/data_0/'c'/data
           x    /data/data_1

group  : </> and </>
0 differences found
attribute: <HICKLE_PYTHON_VERSION of </>> and <HICKLE_PYTHON_VERSION of </>>
0 differences found
attribute: <HICKLE_VERSION of </>> and <HICKLE_VERSION of </>>
size:           H5S_SCALAR           H5S_SCALAR
position        HICKLE_VERSION of </> HICKLE_VERSION of </> difference
------------------------------------------------------------
[ 2]            0            1
[ 4]            4            0
2 differences found
group  : </data> and </data>
0 differences found
attribute: <base_type of </data>> and <base_type of </data>>
0 differences found
attribute: <type of </data>> and <type of </data>>
0 differences found

The main difference is data_0 --> data0 and dictonary items no longer have trailing /data -- no strong feeling on first and latter is definitely an improvement 👍.

The difference in file format means 4.0.4 will not be able to load 4.1.0. It fail with a ValueError: Provided argument 'file_obj' does not appear to be a valid hickle file! ('int' object has no attribute 'name'). I suggest we add a check if the HICKLE_VERSION in the file (e.g. 4.1.0) is greater than installed version (MAJOR MINOR but ignore PATCH), and if it fails to load, it prints a 'try updating hickle to latest version' message.

Check 4.1.0 can load 4.0.4 still

Yup (at least for this basic test file).

@telegraphic
Copy link
Owner

telegraphic commented Jan 16, 2021

Hey @hernot and @1313e, overall I am happy for this to be merged, as it is a precondition on H4EP002 and H4EP003.

What changed and was it worth it?

Firstly, I note that this is a pretty major underlying change to how hickle works under the hood. Even though there are a lot of changes to the loading/dumping logic, changes to the actual file structure are minor. I like the changes to how dictionaries are dumped without extraneous /data, which was made possible by the changes to container loaders. The changes add a level of abstraction, to allow "the possibility to support additional container like objects" in the future. Put another way: the changes are supposed to help make it easy to store custom classes.

Part of the motivation for the changes in H4EP001 was issue #125, to do with use of __getstate__ and __setstate__ for custom classes that @hernot used in his research. After rather lengthy discussion, #125 was closed with the conclusion that "hdf5 data format is not really designed for storing dict, list and tuple structures containing vast amounts of heterogenous data especially if they are organized in a rather chaotic huge tree like structure". #145 offers an alternative approach for custom classes: a specification that if the class supplies __compact__ and __expand__ dunder classes then hickle will be able to understand and store the class (see new issue #148).

So, the big question: are the major changes in H4EP001 worth it? At first glance, the end functionality to the user has not changed!

A quick case study

To weigh this up, I found it useful to look at how the scipy sparse matrix support changed -- see diff here. Sparse matrices are stored in the hdf5 file as three datasets -- which need to be recombined into a single sparse array when loaded by hickle. To allow this previously required the exclude_register in lookup.py. The new functionality implements

class SparseMatrixContainer(PyContainer):

which I think is a slightly better design pattern. However there is some more complexity in lookup.py to maintain backward compatibility. The register_class method is now:

def register_class(myclass_type, hkl_str, dump_function=None, load_function=None, container_class=None):
    """ Register a new hickle class.
    Parameters:
    -----------
        myclass_type type(class): type of class
        hkl_str (str): String to write to HDF5 file to describe class
        dump_function (function def): function to write data to HDF5
        load_function (function def): function to load data from HDF5
        container_class (class def): proxy class to load data from HDF5
    Raises:
    -------
        TypeError:
            myclass_type represents a py_object the loader for which is to
            be provided by hickle.lookup and hickle.hickle module only
            
    """

Where container_class will take a subclass of PyContainer, such as SparseMatrixContainer. We will need good clear documentation and examples for how to register your own class!

My conclusions

These changes add complexity, but they do promise to make some things easier in the future. I think, @hernot and @1313e, we can now agree that some data structures in Python are not easy to map to HDF5 optimally without some ugly code...

Overall I am supportive of merging this given H4EP002 and H4EP003, which extend H4EP001 with more tangible improvements. The improvement to the file structure when dumping dictionaries is also nice.

My apologies once again for the latency on the review. Thanks @hernot and @1313e for your patience. Small request: @hernot in the future, can you pretty please make smaller commits instead of one large one so it's easier to review? (I know this is difficult when refactoring, but it would be very helpful!)

@1313e
Copy link
Collaborator

1313e commented Jan 16, 2021

I am still very sceptical of the usefulness of this PR.
It contains many changes that I would rather not see go through (which cannot be undone due to the lack of this PR being split into several commits), and the feature implementation as a whole is not complete.
Because of the latter, I think that this PR should not be merged into any branch of hickle until it contains a completely implemented feature.
This will require H4EP002 and H4EP003 (apparently) to be fully implemented.
I suggest that this PR is instead kept on @hernot's fork (given that they are the one working on this), and only after the aforementioned proposals have been implemented, that this PR can be merged into the main hickle repo.

@hernot
Copy link
Contributor Author

hernot commented Jan 17, 2021

NOTE:
For testing purpose and thus exceptional form usual release procedure I bumped the minor version number to 4.1.0. This allowed me to properly test that when loading files created by hickle 4.0.x the related fixes and workarounds needed are activated and only then. These are especially required when running under Python 3.8 as this seems to be especially more strict upon pickling and unpickling lambda functions compared to earlier versions. Even more when the to be unpickled functions have vanished from the module they were defined in earlier versions of hickle.
This does not in any case replace the decision upon final version number during release. In case pushing patch number instead of minor would then be preferred i provide any needed support in identifying which items would have to be amended to work properly.

@1313e @telegraphic that is why i wrote this note in one my comments above above. I'm pretty fine if you consider this and all the already prepared following pull-requests, which will introduce further even bigger changes to file format rendering them major and not just minor changes which require to spare them for hickle >= 5.0 release instead, of just bumping version 4 minor. Thinking about it it might be anyway the wiser option.

@1313e I'm pretty fine to first assemble all prepared pieces (#138, #139, #145, and clean-up and finalization) in a hickle 5 RC-1 proposal branch in my fork. Thereby it is for we very important that i do this in full agreement and coordination with you two @telegraphic and @1313e.

So may I suggest that ) create a Hickle-5-RC branch in my forked repo, add there all prepared pull-requests as commits and post here when ready for discussion about it either in continuation of this discussion or as part of together reviewing of the new branch.
Meanwhile i do suggest as i already did once to prioritize #141 which just globally makes hickle ignore any compression related h5py keyword parameter and issues an appropriate warning as @telegraphic suggested on issue #140. And we use the hickle-5-RC branch to prepare hickle 5 without any interference with current productive version of hickle 4.

@1313e
Copy link
Collaborator

1313e commented Jan 17, 2021

Yes, that's fine with me.

@1313e 1313e closed this Jan 18, 2021
@1313e
Copy link
Collaborator

1313e commented Jan 18, 2021

Also, the compression thing I already fixed.

@hernot
Copy link
Contributor Author

hernot commented Jan 18, 2021

I would have appriciated also getting the OK from @telegraphic before closing

@hernot hernot mentioned this pull request Jan 18, 2021
@telegraphic
Copy link
Owner

Just following up: I see @1313e's point that major changes should have functionality improvements, and see you've opened a new PR for a v5 bump -- all sounds good to me. I am more ambivalent about the code changes, so @1313e when we have RC5 ready I'll be relying on you to identify areas that you flag for reversion / request changes.

@1313e
Copy link
Collaborator

1313e commented Jan 27, 2021

Just following up: I see @1313e's point that major changes should have functionality improvements, and see you've opened a new PR for a v5 bump -- all sounds good to me. I am more ambivalent about the code changes, so @1313e when we have RC5 ready I'll be relying on you to identify areas that you flag for reversion / request changes.

Sounds good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants