Skip to content
Valdeva Crema edited this page Sep 5, 2018 · 11 revisions

Welcome to the Spotlight OAI-PMH MODS harvester. The harvester imports sets of MODS or Solr metadata into Spotlight when given a URL, set, and mapping file. The mapping file is a YAML file that maps the spotlight fields to the MODS paths or Solr paths. The OAI-PMH harvester uses two important gems: mods and ruby-oai.

OAI-PMH YAML Mapping File

The format takes this form:

`- spotlight-field: xxx (field names should be lowercase, separated with dashes except for the suffix: firstpart-secondpart_ssim or _tesim) 
   multivalue-breaks: "yes" (optional) - use this for splitting out multiple values to be broken on or faceted on individually (ex - subjects)
   default-value: xxx (optional)
   delimiter: xxx (optional, what to separate all path values with.  Defaults to a space)
   mods:
     - path: xxx (repeatable - all path fields will be concatenated)
       delimiter: xxx (optional)
       attribute: xxx (optional)
       attribute-value: xxx (optional but paired with attribute)
       mods-path: xxx (optional)
       mods-value: xxx (optional)
       subpaths: (optional)
         - subpath: xxx
         - subpath: xxx
   xpath: 
     - xpath-value: xxx (repeatable)
       xpath-namespace-prefix: xxx (optional)
       xpath-namespace-def: xxx (optional)`

NOTE: The spotlight-field name comes from the initial custom metadata field that you add to Spotlight. 'Creation Date' becomes creation-date_ssim/tesim. 'CAPITOL Name' becomes capitol-name_ssim/tesim

Some working examples to see how the fields are used: Most basic:

`- spotlight-field: unique-id_tesim
   mods:
     - path: recordInfo/recordIdentifier`

Use of attributes. This gets the start date 1788

`- spotlight-field: start-date_tesim
   mods:
     - path: originInfo/dateCreated
       attribute: point
       attribute-value: start`

Use of subpaths:

`- spotlight-field: subjects_ssim
   delimiter: "|"
   mods:
       - path: subject
         delimiter: "--"
         subpaths:
           - subpath: name/namePart
           - subpath: topic
           - subpath: geographic
           - subpath: genre`

Use of mods-path to get the creator:

`- spotlight-field: creator_tesim
   mods:
       - path: plain_name
         delimiter: " , "
         mods-path: role/roleTerm
         mods-value: creator
         subpaths:
           - subpath: namePart`

Lastly, an exclamation mark can be used in the values (attribute or mods) OR the attribute itself to exclude values. This gives the date without an attribute of 'point':

`- spotlight-field: date_tesim
   mods:
       - path: originInfo/dateCreated
         attribute: '!point'
         attribute-value: (note - this is required to exist but is blank because we don't need an acutal value)`

Example 2 gives any name without a role of creator:

`- spotlight-field: contributer_tesim
   delimiter: " , "
   mods:
       - path: plain_name
         delimiter: " , "
         mods-path: role/roleTerm
         mods-value: '!creator'
         subpaths:
           - subpath: namePart`

A sample mapping file used for our virtual collections can be found here: vc_mapping.yml

Solr YAML Mapping File

`- spotlight-field: xxx (field names should be separated with dashes except for the suffix: firstpart-secondpart_ssim or _tesim)
   multivalue-breaks: "yes" (optional) - use this for splitting out multiple values to be broken on (and faceted on) individually (ex - subjects)
   default-value: xxx (optional)
   delimiter: xxx (optional, what to separate all path values with.  Defaults to a space)
   solr-field:
     - field-name: xxx (repeatable - all path fields will be concatenated)`
Clone this wiki locally