- Overview
- Workflow Dependencies
- Workflow Steps and Scripts
- Download Workflow
- Backblaze B2 Information
The current workflow for migrating/storing digital materials into off-site cloud storage utilizes Ruby Scripts to prepare and upload data to the Backblaze B2 Storage service. As of writing, scripts have been tested in a Linux environment. Individual descriptions for the scripts can be found at the following links:
Installation of dependencies can be completed with the following commands:
sudo apt install ruby-all-dev
sudo apt install python
sudo apt install python-pip
sudo apt install mediainfo
sudo apt install hashdeep
sudo apt install exiftool
sudo apt install sendmail
sudo pip install b2
sudo gem install mail
NOTE: sendmail must be configured for email reporting to be functional.
Then the repository can be downloaded with the command:
git clone https://github.com/WSU-CDSC/microservices
The scripts rely on a central file containing methods etc, and will look for this script in their same directory, so make sure that the wsu-functions.rb
file is always present along side of the scripts.
There is a conig file (wsu-microservices.config
) that also must be present in the script directory. This file sets the following options:
- Email address for metadata reports
- File path for metada reports (if not set, will default to home directory)
- Credentials for IBM Watson
These are the scripts that are used to generate/maintain/validate metadata across WSU Libraries' (on site) Digital Storage. This metadata consists of sidecar files containing preservation, file integrity (fixity) and technical metadata. This metadata consists of a checksum/file manifest created by Hashdeep, an ExifTool output in JSON, and a MediaInfo output in JSON when A/V files are detected. Additionally, preservation actions such as metadata generation/verification and cloud migration are logged in a JSON file and mapped to PREMIS vocabulary.
- Generate Metadata for collections using
makemeta.rb
- Upload collections to Backblaze B2 Storage using
uploadaip.rb
- Perform ongoing monitoring of metadata via
checkmeta.rb
- After any manual intervention necessitated by results of
checkmeta.rb
update metadata/modification time logs withmakemeta.rb
and (as necessary) resync to cloud withuploadaip.rb
.
The Cyberduck app provides a relatively easy and intuitive way to interface with collections stored in B2. To configure an installation of Cyberduck, please talk to Libraries Systems. Once Cyberduck is installed, you will be able to log in and view/download items stored in B2. Files and directories can be downloaded by clicking on the 'Action' tab and then selecting 'Download.' Downloading via this method will keep original file properties such as creation time.
Note: ALL Packages downloaded for archival needs should have fixity integrity validated. As packages contain hashdeep manifests, hashdeep can be used for this purpose, with the caveat that metadata files might show up as 'unexpected files.' If files are downloaded separate from an AIP, they should have their checksums generated and compared to the checksums housed in their AIP.
For basic access to files stored in B2, the web interface provides convenient browsing/download capabilities (the caveat being that the browser has limitations on folder download sizes). For most patron requests, simply navigating to the file(s) or AIP in the browser and downloading should be sufficient. If a large folder needs to be downloaded, files can be downloaded in smaller chunks, or the command line method discussed below can be used.
When uploaded via aip2b2.rb
, the b2 sync
command is used which stores file metadata such as modification time is stored along side files. In the event of a download via the CLI, (such as migration of data out of B2), it is important to use the reverse of this process to maintain this metadata. This can be done again using the sync
command in reverse. It is suggested to do a 'dry run' download first to make sure you are using desired paths etc. An example command is:
b2 sync --dryRun 'b2://BUCKET-NAME/PATH-TO-TARGET' '/home/myuser/Desktop/TEST'
. To execute the download simply remove the --dryRun
flag.
Note: ALL Packages downloaded for archival needs should have AIP fixity integrity validated
More information about the sync
command is available from Backblaze in this how to article
Pricing (As of writing):
- Storage: $0.005 per GB per month
- Download: $0.01 per GB
- For downloads up to 3.5 TB, a physical drive can be used with Backblaze offering a service to refund cost of drive upon return.
File Integrity: SHA-1 Checksums generated and verified on upload using sync
command.
Data Location: Sacramento Area.