autooai

Automatically create an OAI harvester for the SHARE Project.

In order to automatically generate a harvester, it's assumed you'll have an API endpoint that will return xml in standard OAI-PMH format. This will be your base URL! Use that in the command line interface.

For example, MIT has an OAI PMH endpoint, and one of the ways to access it is: http://dspace.mit.edu/oai/request?verb=Identify

Since this tool is specifically for the SHARE project, you should be running commands from a directory that is inside a directory on the same level as your scrapi (or SHARE core) directory.

Your directory structure should be something like this:

code
├── autooai
├── scrapi

That way, your newly generated OAI harvesters will be generated in the correct folder within your scrapi instance - namely within

scrapi/scrapi/harvesters

Setup

From within the autooai directory...

Install requirements using pip inside a virtual environment by running

pip install -r requirements.txt

Once you've installed all the requirements, you're ready to get started generating OAI-PMH harvesters for SHARE!

Generating a Harvester

Autooai is a command line tool that takes a few arguments and will generate a SHARE harvester based on those arguments.

Here's an example of how to use this tool to generate a SHARE OAI harvester for the MIT repository:

python autooai/main.py -b http://dspace.mit.edu/oai/request -s mit -f

This will do a few things:

Use the baseurl of http://dspace.mit.edu/oai/request to generate a harvester
- This baseurl is the beginning of the oai endpoint, and includes everything before the ? in the oai pmh request url
- Example: http://repository.stcloudstate.edu/do/oai/
- Not: http://repository.stcloudstate.edu/do/oai/?verb=Identify
Use mit as the shortname when generating the harvester
save the MIT favicon to the proper directory within scrapi (scrapi/img/favicons)

Here's the main usage:

usage: main.py [-h] -b BASEURL -s SHORTNAME [-f]

A command line interface to create and commit a new harvester

required arguments:
  -b BASEURL, --baseurl BASEURL
                        The base url for the OAI provider, everything before
                        the ?
  -s SHORTNAME, --shortname SHORTNAME
                        The shortname of the provider

optional arguments:
  -f, --favicon         flag to signal saving favicon

  -h, --help            show this help message and exit

Running your new harvester

Assuming you've already done all of the setup for scrapi, you're ready to run the harvester you've just generated, and try to gather some data into scrapi.

Enter the scrapi directory, one up from your current autooai directory cd ../scrapi

Run the harvester using invoke and the shortname you created the harvester with

invoke harvester insert-shortname-here

You can then check out the results on your local elasticsearch instance running on http://localhost:9200/share_v2/_search

If you're running the OSF locally, you can explore search results on localhost:5000/share after running invoke provider_map

Run tests on scrapi, including your newly created harvester test, with invoke test

Potential Pitfalls

elasticsearch index errors

On a new scrapi setup, you may have to alias the share index to the most current version:

invoke alias share share_v2

Failing tests

There is a chance that your automatically created test will fail in scrAPI when run for the first time. If that's the case, you can create a new vcr file in scrAPI that will hopefully work.

Delete the old vcr file inside scrapi/tests/vcr/shortname.py
Change the date within the "freeze time" decorator on line 14 to a date where you know the harvester had results. For example: @freeze_time("2014-03-15)
Inside of scrapi/tests/test_harvesters.py change the 'record_mode' on line 22 to 'once.' It should now read: with vcr.use_cassette('tests/vcr/{}.yaml'.format(harvester_name), match_on=['host'], record_mode='once'):
Re-run the tests with invoke test
Make sure to not save these changes to test_harvesters.py

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.gitignore		.gitignore
Bepress Exploring.ipynb		Bepress Exploring.ipynb
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autooai

Setup

Generating a Harvester

Running your new harvester

Potential Pitfalls

elasticsearch index errors

Failing tests

About

Releases

Packages

Contributors 3

Languages

erinspace/autooai

Folders and files

Latest commit

History

Repository files navigation

autooai

Setup

Generating a Harvester

Running your new harvester

Potential Pitfalls

elasticsearch index errors

Failing tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages