Skip to content

Administration of running system

Jan Tomášek edited this page Dec 15, 2022 · 14 revisions

This page describes the organization of data in ARCLib so that the administrator can explore and debug the data. Moreover, it describes the debugging process of a running instance of ARCLib and logging.

Organization of data

Transfer area

  • Transfer area is mapped to the folder configured by the fileStorage parameter configured in application.yml.
  • Every producer has its own folder in transfer area (attribute of the producer entity).
  • Every ingest routine can be mapped to a custom folder nested anywhere inside the folder of the related producer by setting its transfer area attribute. If the attribute is not set, routine is mapped to the root folder of the producer.
  • Producer adds SIP packages to the folder mapped with the routine.
  • When the routine job is triggered by the CRON period, ARCLib worker nodes starts copying SIPs into the ARCLib workspace as part of the ingest workflow. Only SIPs located directly in the mapped folder are processed.
  • If the parameter deleteSipFromTransferArea in application.yml is set to true, SIP packages are deleted after the successful finish of ingest workflow process.

Requirements for the SIP packages saved in transfer area

  • SIP packages must respect the format: packageName.zip + packageName.sums, the file extensions .zip a .sums are compulsory
  • the name of the root folder contained in the zipped file must have the same name as the zipped file (without the .zip file extension)
  • encoding of packageName.sums must be in UTF-8
  • file content of packageName.sums must respect the format: {FIXITY TYPE} {FIXITY VALUE}, fixity type is one of: MD5, Crc32, Sha512, e.g. MD5 8501E49E4A2A5FA4BD1843E674C1B1D5

Workspace

  • workspace is by default mapped to workspace folder located at the root of the project (created automatically if it does not exits), path to the folder is configurable in application.yml
  • every ingest workflow has its own folder in Workspace with the name same as the XML id, e.g. ARCLIB_000000004
  • this folder contains the .zip file with the SIP content and the unpacked SIP content
  • after the ingest workflow has finished, the folder belonging ingest workflow is deleted

Quarantine

  • quarantine folder is by default mapped to quarantine folder located in the Workspace folder (created automatically if it does not exits), the folder path is configurable in application.yml
  • in the moment when SIP is moved to quarantine, a folder is created in the quarantine folder with the name same as the XML ID (e.g. ARCLIB_000000004) and both the .zip file with SIP content as well as the unzipped SIP content is moved to this folder

Debug archival storage

  • place to which the SIP packages are stored instead of the real Archival storage when the debugging mode is activated
  • represented by a folder named arcStorageData located in Workspace

Lists of indexed attributes of ARClibXml

The indexation of ARCLibXml is defined in the file arclibXmlIndexConfig.csv

Its content originates from the ARCLib XML Index Config

Debugging

Debugging the ARCLib running as a system service with name arclib:

  1. sudo systemctl status arclib

  2. journalctl -u arclib --since "2018-01-24 23:25:50" --no-pager, replace 2018-01-24 23:25:50 with a date in the format YYYY-MM-DD HH:MM:SS

Accessing the debugging version of Archival storage running at /arclib/debug:

  1. exportAip: /aip/export/{aipId} - exports AIP
  2. exportXml:/aip/export/{aipId}/xml - exports the latest or the specified version of ARCLibXml
  3. forget: /{batchId}/forget - deletes batch and all its respective ingest workflows, applicable only for batches processed using a producer profile in the debugging mode

Logging

Each day there are four files created containing the specific groups of logs:

  1. arclib.{DATE}.log - info log file

  2. arclib.debug.{DATE}.log - debug log file

  3. arclib.error.{DATE}.log - error log file

  4. arclib.audit.{DATE}.log - audit log file

There are five levels of logs + audit logs:

Log level Console Debug log file Info log file Error log file Audit log file Description
ERROR X X X X - fatal errors
WARN X X X - - unexpected states that are not fatal errors
INFO X X X - - high level information
DEBUG X X - - - medium level information
TRACE X - - - - low level information
AUDIT X - - - X audit information
Clone this wiki locally