Skip to content

Tools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.

License

Notifications You must be signed in to change notification settings

MetaArchive/bagit-split

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NOTE: As of 2023 this tool is no longer maintained or supported.

BagIt Splitting and Unsplitting Tools

Overview

This tool provides functionality for splitting BagIt bags into a collection of smaller bags, and for "unsplitting" these bags back into a single bag mostly identical to the original.

Usage

To see the full command-line help text, do:

$ python bag-split.py --help

Prerequisites

None, but this script operates on split bags created by the Library of Congress' bagit-java tool.

Splitting a Bag

$ bag splitbagbysize <BAG> --maxbagsize 30
$ python bag-split.py split <BAG>

The first command above uses the official BagIt command-line utility (bag) to split the original , in this example using 30GB as the per-bag limit. You can also use --maxbagsize values like .001 to indicate 1 MB (for example).

The second command uses this tool to verify the split bags against the original bag for integrity and completeness, as well as to create an additional "metadata" bag among the split bags; the /data directory of the metadata bag will contain the original bag's manifests and bag-info.txt file.

Unsplitting a Bag

$ python bag-split.py unsplit <DIRECTORY CONTAINING BAGS>

This command creates a new directory called MERGED_BAG by merging the bags found inside into a single reconstructed bag. The tool will check that the reconstructed bag is a faithful reconstruction of the original. If the input directory's name ended in "_split" (this is added by the LoC tool when it splits a bag), the resulting directory will have the same name with "_split" removed; if the input directory name did not end in "_split", "_merged" will be added to the name of the input directory.

License

See LICENSE.txt

About

Tools for handling the splitting (and unsplitting) of BagIt archives. Made with Python.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages