Skip to content

ranti/ingestion

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

To install or upgrade the ingest subsystem, first install the necessary components;

$ pip install --no-deps --ignore-installed -r requirements.txt

Then set up and start the (Akara) server;

$ akara -f akara.conf setup
$ akara -f akara.conf start

You can test it with this set description from Clemson;

$ curl "http://localhost:8889/oai.listrecords.json?endpoint=http://repository.clemson.edu/cgi-bin/oai.exe&oaiset=jfb&limit=10" 

If you have the endpoint URL but not a set id, there's a separate service for listing the sets;

$ curl "http://localhost:8889/oai.listsets.json?endpoint=http://repository.clemson.edu/cgi-bin/oai.exe&limit=10"

To run the ingest process, install the script then feed it a source profile description;

$ python setup.py install
$ mkdir profiles && mkdir data
$ cat <<DONE  >profiles/myprofile.pjs
{"endpoint_URL":"http://localhost:8889/oai.listrecords.json?endpoint=http://repository.clemson.edu/cgi-bin/oai.exe&oaiset=jfb&limit=10"}
DONE
$ poll_profiles profiles data

Source profiles are represented as JSON objects. Their properties include;

  • endpoint_URL; the Akara-wrapped URL from which JSON representations are retrieved.
  • subresources; for OAI, names individual sets in an OAI store. When used, endpoint_URL should terminate with "&oaiset=" (this may change)
  • last_checked; read-only timestamp indicating the last time this source was polled
  • enrichments; ordered list of Akara enrichment services for collections, including any service specific query parameters
  • enrichments_rec; ordered list of Akara enrichment services for records, including any service specific query parameters

Enrichment pipelines are implemented through a central enrichment service which interprets the list of other services as communicated via a "Pipeline" HTTP header on a POST request. For example, given a data.sjs data document, the following request will send that data through the provided pipeline;

$ curl -X POST -d @data.sjs -H "Pipeline: http://localhost:8889/geocode?p=location" http://localhost:8889/enrich

Current enrichment services are;

  • shred/unshred; ',' based string/list and list/string (de)construction. The "prop" parameter specifies which property is to be shredded/unshredded
  • geocode; replaces the named property (via the "prop" parameter) with its lat/long. NOTE; in order to use geo lookups, the geonames sqlite file has to be created using the instructions and stored in the "caches" directory below the home directory of akara.conf
  • select-id; creates or updates an "id" property to the value of the property named by the "prop" parameter

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published