DEV/TST Unbundle and remove data files #336

drewejohnson · 2019-09-08T23:10:43Z

Currently, all of our test files and example files are distributed with the python package

Lines 63 to 66 in 88b5707

    
           'package_data': { 
        
               'serpentTools.data': ['data/{}'.format(ext) for ext in DATA_EXTS], 
        
           }, 
        
           'include_package_data': True,

meaning people receive a lot of matlab files that they may never need with every version. Bundling these files inside the python package really only serves to make the example notebooks work well, through the serpentTools.data.getFile function. However, since the matlab files have a lot of lines, this has the effect of making this project appear to those on GitHub that we are a Matlab project. This is more a personal annoyance, but the main issue is that we do not need to bundle these files inside the python package.

I propose that we remove these files from the python package and from the repository entirely. I propose we do this by making a new repository CORE-GATECH-GROUP/serpent-data or something like that, and include this as a git submodule. This is not unheard of for python packages. The seaborn visualization package uses the seaborn-data repository to fetch files for examples. Although seaborn downloads the files with searborn.load_dataset into a dedicated directory rather than using a submodule.

After adding this new repository as a git submodule, we will have access to all the data files, just in a different directory. This keeps the content of this repository strictly focused on the python side, while requiring developers to fetch a submodule for full testing. Thankfully git makes this pretty easy; developers can obtain submodules during the cloning process:

$ git clone --recurse-submodules

or, for existing developers who already have cloned serpentTools:

$ git submodule init
$ git submodule update

Impact

The serpentTools.data module will be removed. This is heavily used in our testing and in the example notebooks. We can use a pytest fixture to ensure that all the files are loaded, e.g. the developer has fetched the submodule and obtained the correct test files. We can fix the notebooks by reading the files as if they appeared in the working directory and using serpentTools.read on them. This will clear up some of the prefacing we include in our example notebooks telling people not to use serpentTools.readDataFile.

For travis testing, we can simply link the files from the submodule into the temporary directory so that commands like serpentTools.read("example_det0.m") will be valid.

We should also add some information in our developer guide on adding new test files and updating this submodule. Since we nearly have all the readers done, we shouldn't have to add extra files.

The biggest frustration I can see is as follows: I add a new reader that needs a new file. I then have to add the file to serpent-data through a PR, get that PR approved, update the submodule for serpentTools to use the updated serpent-data repository before I submit the new PR so that Travis can use the newest data repository.

Related reading

The text was updated successfully, but these errors were encountered:

The serpentTools.data.getFile and readDataFile functions now rely on the evnironment variable SERPENT_TOOLS_DATA. This will allow the files to be stored in any location, including outside the repository, but more importantly, outside the python package. Since this change is done inside the getFile function, none of the tests have to be modified at all. The travis build commands will have to be slightly modified to use this directory, and those changes will come shortly. Notes have been added to the developer guide to indicate the use of this variable, and how to set it. Related: GH Issue CORE-GATECH-GROUP#336

drewejohnson · 2020-08-17T14:24:59Z

Going to consider this closed with #376 and (to a lesser degree) #404. Submodules may be the way to go in the future, but that is yet to be seen

drewejohnson added testing Related to our test suite developer Developer-focused changes labels Sep 8, 2019

drewejohnson added this to the 0.10.0 milestone Nov 25, 2019

drewejohnson mentioned this issue Jan 17, 2020

DEV Unbundle data files from python package #376

Merged

drewejohnson closed this as completed Aug 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEV/TST Unbundle and remove data files #336

DEV/TST Unbundle and remove data files #336

drewejohnson commented Sep 8, 2019

drewejohnson commented Aug 17, 2020

DEV/TST Unbundle and remove data files #336

DEV/TST Unbundle and remove data files #336

Comments

drewejohnson commented Sep 8, 2019

Impact

Related reading

drewejohnson commented Aug 17, 2020