Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Extended Testing Data Requirements #27

Closed
sanguineturtle opened this issue Jul 18, 2014 · 17 comments
Closed

Document Extended Testing Data Requirements #27

sanguineturtle opened this issue Jul 18, 2014 · 17 comments

Comments

@sanguineturtle
Copy link
Contributor

This Issue is to collect any Data Requirements we may think when writing tests, that will demand infrastructure for importing a data file etc.

  1. , , Brief reason, Brief description of data requirement
@sanguineturtle
Copy link
Contributor Author

Just documenting one simple way to implement this functionality by adding the following function to quantecon utils package:

def package_folder(__file__, localdir):
    """
    Simple locator for finding package folders

    Parameters
    ----------
    __file__    : pass in the file location
    localdir    : specify the directory name 
                  (i.e. 'data' if calling file is in tests/ this will return absolute reference for tests/data)

    Returns
    -------
    path        : absolute path to package sub-directory

    Notes
    -----
    [1] This only works for local sub-directories (which is the majority of use cases)

    """
    this_dir, this_filename = os.path.split(__file__)
    path = os.path.join(this_dir, localdir)
    return check_directory(path)

Then import: import quantecon.util as util

fn = util.package_folder(__file__, 'data') + fn

this will return the package level absolute reference for a sub-directory 'data' from where it is called within the package.

However if there are a lot of files to read this will significantly slow down the tests. Therefore, it is probably better to hold off for now and use a hdf5 file container for a single interaction with the filesystem.

Note: The check_directory method is a simple directory checker.

@sglyon
Copy link
Member

sglyon commented Aug 5, 2014

Hey @sanguineturtle I have needed some data as I have been writing tests. For now I am just using a file at quantecon/tests/data/testing_data.h5.

I have a few functions in quantecon.tests.util that help us gain access to this file.

Does the addition of these tools resolve this issue?

@sanguineturtle
Copy link
Contributor Author

@spencerlyon2 Great set of utilities!

I am fairly new to hdf but here are my thoughts and questions.

Function: write_array

  1. write_array does this add an array into the testing_data.h5 file. Do we need to think about other data types?
  2. write_array should we check that we aren't replacing an array already found in the h5 file? Or does hd5py or pytables take care of this for us?
  3. Should we establish a standard key definition for logically retrieving data in the h5 file?

Integrity of Data:

  1. How do we maintain the integrity of the testing_data.h5 file? In the past I have used the hashib [https://docs.python.org/2/library/hashlib.html] library to generate md5hash's which I then use to check the integrity of the data when it is accessed to ensure it hasn't changed. (Once the data is static)

Git Considerations:

  1. Should we have a single testing_data.h5 file or split them up according to modules or subpackages to reduce the re-writes of a binary file to the repository?
  2. Should we add the testing_data.h5 file as non-tracked to the repository? As I understand hdf is that they will look like binary files to git.
  3. Should we store the testing_data.h5 file on the server for download?

If we end up using an external testing_data.h5 file, is there a way to tell nosetests to skip certain tests if the file is not found?

@sanguineturtle
Copy link
Contributor Author

@spencerlyon2 This nosetests approach seems to work quite well for differentiating between test functions that require data and test functions that don't require data.

def test_something():
    some test here

test_something.extdata = True

If we assign this attribute to a function then we can skip it if we want through nosetests:

nosetests -a "!extdata"
-a=ATTR, --attr=ATTR
Run only tests that have attributes specified by ATTR [NOSE_ATTR]

I use this to demarcate between slow and fast tests. Have you seen any other approaches?

@sglyon
Copy link
Member

sglyon commented Aug 7, 2014

That approach does work. Another one that I have been using (see here and here) is doing the import from nose.plugins.attrib import attr at the top of the test file and then adding the decorator @attrib("slow") to tests that are slow.

Then I can run nosetests -a '!slow' to get the same effect. It sounds like we found the same thing.

We can also write a little makefile that has declarations for running all (probably the default), running a particular module (can you pass extra arguments to a makefile? something like make mod quantecon/tests/test_jv.py, which would call the mod declaration and pass the path to the script as an argument), or running only "not slow" tests.

@sanguineturtle
Copy link
Contributor Author

Indeed. test_something.extdata = True is adding the attribute extdata to the test_something function.

Re: make. It looks like you can but it isn't super straight forward.
http://stackoverflow.com/questions/2214575/passing-arguments-to-make-run

Perhaps we should just write a python script to run the tests and then it is platform independent? We could keep it in scripts folder at the base level of the repository.

@cc7768 What does Continuous Travis do? Does it need a config script if we want to specify subsets of tests or does it just run all tests?

@cc7768
Copy link
Member

cc7768 commented Aug 7, 2014

@sanguineturtle @spencerlyon2 You can see exactly what Continuous Travis does in the .travis.yaml file that is in the tests branch. Right now, all it does to run tests is nosetests --with-coverage --cover-package=quantecon and then if the tests run successfully it runs coveralls ( See here ). The coverage/coverall stuff has to do with returning the percent of the code that is covered by the tests and creating the little icon that tells us that information. I would be very surprised if we couldn't work in the bells and whistles that you guys are talking about.

@sglyon
Copy link
Member

sglyon commented Aug 7, 2014

@sanguineturtle I think just writing a python script is a good way to go. We can just make it executable and it would do all of what I was thinking to use make for. Good suggestion

@sglyon
Copy link
Member

sglyon commented Aug 29, 2014

@sanguineturtle, does this issue need to be open still?

@sanguineturtle
Copy link
Contributor Author

@spencerlyon2 I guess I am still not clear on how we are managing the HDF data file? I know we have one but should we consider versioning it etc. (How do we add data to it etc). In my recent experience with HDF, small things like concurrent access can corrupt files.

I think a number of items from my comments above aren't fully resolved so it would be worth leaving this open.

@mmcky
Copy link
Contributor

mmcky commented Aug 3, 2015

@spencerlyon2 Re testing_data.h5 it is version controlled within git. Do we need to write documentation for updating data contained within it? Is that something I need to add to quantecon.org/wiki/python? When you update h5 files do you use python to open, add data and close or do you use a utility like HDFView?

@sglyon
Copy link
Member

sglyon commented Aug 3, 2015

I don't have any docs written for dealing with it.

Whenever I have made updates they have been directly with python -- not any other application.

Also, I did a quick scan through the repo and can't see the file you mention. Any idea where it went?

@mmcky
Copy link
Contributor

mmcky commented Aug 3, 2015

I have it on my local machine.

It seems to be referenced in

util.py:    data_file = join(data_dir, "testing_data.h5") 

It was deleted August 08, 2014. Does this need to be reinstated?

@sglyon
Copy link
Member

sglyon commented Aug 5, 2015

I don't think so.

i think what we decided was to not have this file checked into git. Then the first time you run the tests it will be created for you so you have a local cache of data (see here).

@mmcky
Copy link
Contributor

mmcky commented Aug 5, 2015

@spencerlyon2 Oh I think I understand now. This is really just a container for storing results then rather than persistent data used for testing?

@sglyon
Copy link
Member

sglyon commented Aug 5, 2015

Yep. It just speeds up the running of the tests

If the file is missing, it will be generated when you run the tests.

@mmcky
Copy link
Contributor

mmcky commented Aug 5, 2015

@spencerlyon2 Great - thanks. Therefore we don't really need documentation to show how to populate the object. I had assumed there were data tables included and hence why I thought it should be version controlled etc. Closing Issue.

@mmcky mmcky closed this as completed Aug 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants