You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
meaning people receive a lot of matlab files that they may never need with every version. Bundling these files inside the python package really only serves to make the example notebooks work well, through the serpentTools.data.getFile function. However, since the matlab files have a lot of lines, this has the effect of making this project appear to those on GitHub that we are a Matlab project. This is more a personal annoyance, but the main issue is that we do not need to bundle these files inside the python package.
I propose that we remove these files from the python package and from the repository entirely. I propose we do this by making a new repository CORE-GATECH-GROUP/serpent-data or something like that, and include this as a git submodule. This is not unheard of for python packages. The seaborn visualization package uses the seaborn-data repository to fetch files for examples. Although seaborn downloads the files with searborn.load_dataset into a dedicated directory rather than using a submodule.
After adding this new repository as a git submodule, we will have access to all the data files, just in a different directory. This keeps the content of this repository strictly focused on the python side, while requiring developers to fetch a submodule for full testing. Thankfully git makes this pretty easy; developers can obtain submodules during the cloning process:
$ git clone --recurse-submodules
or, for existing developers who already have cloned serpentTools:
$ git submodule init
$ git submodule update
Impact
The serpentTools.data module will be removed. This is heavily used in our testing and in the example notebooks. We can use a pytest fixture to ensure that all the files are loaded, e.g. the developer has fetched the submodule and obtained the correct test files. We can fix the notebooks by reading the files as if they appeared in the working directory and using serpentTools.read on them. This will clear up some of the prefacing we include in our example notebooks telling people not to use serpentTools.readDataFile.
For travis testing, we can simply link the files from the submodule into the temporary directory so that commands like serpentTools.read("example_det0.m") will be valid.
We should also add some information in our developer guide on adding new test files and updating this submodule. Since we nearly have all the readers done, we shouldn't have to add extra files.
The biggest frustration I can see is as follows: I add a new reader that needs a new file. I then have to add the file to serpent-data through a PR, get that PR approved, update the submodule for serpentTools to use the updated serpent-data repository before I submit the new PR so that Travis can use the newest data repository.
The serpentTools.data.getFile and readDataFile functions now
rely on the evnironment variable SERPENT_TOOLS_DATA. This will allow
the files to be stored in any location, including outside the
repository, but more importantly, outside the python package.
Since this change is done inside the getFile function, none of the
tests have to be modified at all. The travis build commands will
have to be slightly modified to use this directory, and those
changes will come shortly.
Notes have been added to the developer guide to indicate the use
of this variable, and how to set it.
Related: GH Issue CORE-GATECH-GROUP#336
Currently, all of our test files and example files are distributed with the python package
serpent-tools/setup.py
Lines 63 to 66 in 88b5707
meaning people receive a lot of matlab files that they may never need with every version. Bundling these files inside the python package really only serves to make the example notebooks work well, through the
serpentTools.data.getFile
function. However, since the matlab files have a lot of lines, this has the effect of making this project appear to those on GitHub that we are a Matlab project. This is more a personal annoyance, but the main issue is that we do not need to bundle these files inside the python package.I propose that we remove these files from the python package and from the repository entirely. I propose we do this by making a new repository
CORE-GATECH-GROUP/serpent-data
or something like that, and include this as a git submodule. This is not unheard of for python packages. Theseaborn
visualization package uses theseaborn-data
repository to fetch files for examples. Althoughseaborn
downloads the files withsearborn.load_dataset
into a dedicated directory rather than using a submodule.After adding this new repository as a git submodule, we will have access to all the data files, just in a different directory. This keeps the content of this repository strictly focused on the python side, while requiring developers to fetch a submodule for full testing. Thankfully
git
makes this pretty easy; developers can obtain submodules during the cloning process:or, for existing developers who already have cloned
serpentTools
:Impact
The
serpentTools.data
module will be removed. This is heavily used in our testing and in the example notebooks. We can use apytest
fixture to ensure that all the files are loaded, e.g. the developer has fetched the submodule and obtained the correct test files. We can fix the notebooks by reading the files as if they appeared in the working directory and usingserpentTools.read
on them. This will clear up some of the prefacing we include in our example notebooks telling people not to useserpentTools.readDataFile
.For travis testing, we can simply link the files from the submodule into the temporary directory so that commands like
serpentTools.read("example_det0.m")
will be valid.We should also add some information in our developer guide on adding new test files and updating this submodule. Since we nearly have all the readers done, we shouldn't have to add extra files.
The biggest frustration I can see is as follows: I add a new reader that needs a new file. I then have to add the file to
serpent-data
through a PR, get that PR approved, update the submodule forserpentTools
to use the updatedserpent-data
repository before I submit the new PR so that Travis can use the newest data repository.Related reading
git-submodule
referenceThe text was updated successfully, but these errors were encountered: