Skip to content

Sip Format

Jan Tomášek edited this page Dec 14, 2020 · 6 revisions

SIP format in general

Every SIP to be consumed by ARCLib must respect following general rules:

  • SIP is a .zip file which contains exactly one root folder and no files
  • the root folder must have the same name as the .zip file (without .zip extension)
  • SIP must contain one metadata file (XML) from which the metadata are extracted into AIP XML
  • SIP must contain a metadata file with Authorial ID of the SIP, path to the ID must be expressible using XPath 3.1

The ability to support various SIP formats is based mainly on the ARCLib sip profiles.

SIP with one main METS file

Most of the currently supported SIP formats contains one main METS file which describes the whole package and contains (in addition to the descriptive and administrative metadata) checksums for all files. In this case, the Package Type attribute of the SIP profile is set to METS.

SIP with multiple METS files

If the SIP package does not contain one main METS file describing the whole package, it may still be persisted into ARCLib, however the general rule still applies - only one METS file may be chosen as a source of metadata extraction into AIP XML.

It is possible to ingest these types of packages with Package Type = METS SIP profile, however, in that case the fixity check done by the fixity checker BPM task would be limited to files described in the particular METS file.

For that reason, ARCLib supports BAGIT Package Type. Using this type, the fixity checker uses the BAGIT checksum files instead of the METS filesec section. If a producer want use this, the original SIP must be packed into BAGIT and zipped (the .zip must still follow the general rules).

Clone this wiki locally