Skip to content

Commit

Permalink
Do _not_ revert bag when updating a bag with strict enabled.
Browse files Browse the repository at this point in the history
Update docs and unit tests.
  • Loading branch information
mikedarcy committed May 10, 2024
1 parent f6342e0 commit efc1d33
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 24 deletions.
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ encoded for the `filename` field and that whitespace (` ` and `\t`) should _only
* NOTE: As a best practice, applications should always pre-encode URLs that are added to `fetch.txt` and not rely on `bdbag` to do so, since only whitespace will be encoded.
* Added a new option `strict` to the `make_bag` API function, along with a corresponding CLI argument. If `strict` is enabled,
`make_bag` will automatically validate a newly created or updated bag for structural validity and fail if the resultant bag is invalid.
This can be used to ensure that a bag is not persisted without payload file manifests. Additionally, if the created or
updated output bag is not structurally valid, the bag will subsequently be reverted back to a normal directory and a BagValidationError exception will be thrown.
This can be used to ensure that a bag is not persisted without payload file manifests. Additionally, if a created output
bag is not structurally valid, the bag will subsequently be reverted back to a normal directory. An updated bag will _not_ be reverted.
In either case, a BagValidationError exception will be thrown.

## 1.7.2

Expand Down
8 changes: 5 additions & 3 deletions bdbag/bdbag_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,10 +339,12 @@ def make_bag(bag_path,
try:
bag._validate_structure()
except bdbagit.BagValidationError as e:
error = ("The newly created/updated bag is not structurally valid and strict checking has been requested. "
"The bag will be reverted back to a normal directory. Exception: %s\n") % get_typed_exception(e)
error = ("The newly created/updated bag is not structurally valid and strict checking has been requested.%s"
" Exception: %s\n" % (" The bag will be reverted back to a normal directory." if not update else "",
get_typed_exception(e)))
logger.error(error)
revert_bag(bag_path)
if not update:
revert_bag(bag_path)
raise bdbagit.BagValidationError(error)

return bag
Expand Down
5 changes: 3 additions & 2 deletions bdbag/bdbag_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,9 @@ def parse_cli():
strict_arg, action="store_true",
help="Automatically validate a newly created or updated bag for structural validity and fail if the resultant "
"bag is invalid. This can be used to ensure that a bag is not persisted without payload file manifests. "
"If this flag is set and the created or updated output bag is not structurally valid, the bag will "
"subsequently be reverted back to a normal directory and an error returned.")
"If this flag is set and a created output bag is not structurally valid, the bag will "
"subsequently be reverted back to a normal directory. An updated bag will not be reverted. "
"In either case, an error is returned.")

revert_arg = "--revert"
standard_args.add_argument(
Expand Down
28 changes: 14 additions & 14 deletions doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,21 +195,21 @@ make_bag(bag_path,
Creates or updates the bag denoted by the `bag_path` argument.

##### Parameters
| Param | Type | Description |
|----------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bag_path | `string` | A normalized, absolute path to a bag directory. |
| algs | `list` | A list of checksum algorithms to use for calculating file fixities. When creating a bag, only the checksums present in this variable will be used. When updating a bag, this function will take the union of any existing bag algorithms and what is specified by this parameter, ***except*** when the `prune_manifests` parameter is specified, in which case then only the algorithms specifed by this parameter will be used. |
| update | `boolean` | If `bag_path` represents an existing bag, update it. If this parameter is not specified when invoking this function on an existing bag, the function is essentially a NOOP and will emit a logging message to that effect. |
| save_manifests | `boolean` | Defaults to `True`. If true, saves all manifests, recalculating all checksums and regenerating `fetch.txt`. If false, only tagfile manifest checksums are recalculated. Use this flag as an optimization (to avoid recalculating payload file checksums) when only the bag metadata has been changed. This parameter is only meaningful during update operations, otherwise it is ignored. |
| prune_manifests | `boolean` | Removes any file and tagfile manifests for checksums that are not listed in the `algs` variable. This parameter is only meaningful during update operations, otherwise it is ignored. |
| metadata | `dict` | A dictionary of key-value pairs that will be written directly to the bag's 'bag-info.txt' file. |
| metadata_file | `string` | A JSON file representation of metadata that will be written directly to the bag's 'bag-info.txt' file. The format of this metadata is described [here](./config.md#metadata). |
| remote_file_manifest | `string` | A path to a JSON file representation of remote file entries that will be used to add remote files to the bag file manifest(s) and used to create the bag's `fetch.txt`. The format of this file is described [here](./config.md/#remote-file-manifest). |
| config_file | `string` | A JSON file representation of configuration data that is used during bag creation and update. The format of this file is described [here](./config.md#bdbag.json). |
| ro_metadata | `dict` | A dictionary that will be used to serialize data into one or more JSON files into the bag's `metadata` directory. The format of this metadata is described [here](./config.md#ro_metadata). |
| ro_metadata_file | `string` | A path to a JSON file representation of RO metadata that will be used to serialize data into one or more JSON files into the bag's `metadata` directory. The format of this metadata is described [here](./config.md#ro_metadata). |
| Param | Type | Description |
|----------------------|-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bag_path | `string` | A normalized, absolute path to a bag directory. |
| algs | `list` | A list of checksum algorithms to use for calculating file fixities. When creating a bag, only the checksums present in this variable will be used. When updating a bag, this function will take the union of any existing bag algorithms and what is specified by this parameter, ***except*** when the `prune_manifests` parameter is specified, in which case then only the algorithms specifed by this parameter will be used. |
| update | `boolean` | If `bag_path` represents an existing bag, update it. If this parameter is not specified when invoking this function on an existing bag, the function is essentially a NOOP and will emit a logging message to that effect. |
| save_manifests | `boolean` | Defaults to `True`. If true, saves all manifests, recalculating all checksums and regenerating `fetch.txt`. If false, only tagfile manifest checksums are recalculated. Use this flag as an optimization (to avoid recalculating payload file checksums) when only the bag metadata has been changed. This parameter is only meaningful during update operations, otherwise it is ignored. |
| prune_manifests | `boolean` | Removes any file and tagfile manifests for checksums that are not listed in the `algs` variable. This parameter is only meaningful during update operations, otherwise it is ignored. |
| metadata | `dict` | A dictionary of key-value pairs that will be written directly to the bag's 'bag-info.txt' file. |
| metadata_file | `string` | A JSON file representation of metadata that will be written directly to the bag's 'bag-info.txt' file. The format of this metadata is described [here](./config.md#metadata). |
| remote_file_manifest | `string` | A path to a JSON file representation of remote file entries that will be used to add remote files to the bag file manifest(s) and used to create the bag's `fetch.txt`. The format of this file is described [here](./config.md/#remote-file-manifest). |
| config_file | `string` | A JSON file representation of configuration data that is used during bag creation and update. The format of this file is described [here](./config.md#bdbag.json). |
| ro_metadata | `dict` | A dictionary that will be used to serialize data into one or more JSON files into the bag's `metadata` directory. The format of this metadata is described [here](./config.md#ro_metadata). |
| ro_metadata_file | `string` | A path to a JSON file representation of RO metadata that will be used to serialize data into one or more JSON files into the bag's `metadata` directory. The format of this metadata is described [here](./config.md#ro_metadata). |
| idempotent | `boolean` | If `True`, date and time specific metadata such as `Bagging-Date` and `Bagging-Time` will be _removed_ (if present) from `bag-info.txt`. This value defaults to `False` if not passed via argument. However, a global override default value of `True` can be enabled in the [config file](./config.md). NOTE: use of `ro_metadata` and `ro_metadata_file` in conjunction with `idempotent` is not recommended at this time due to the generated RO Metadata not being compatible with bag idempotency. |
| strict | `boolean` | If `True`, automatically validate a newly created or updated bag for structural validity and fail if the resultant bag is invalid. This can be used to ensure that a bag is not persisted without payload file manifests. Furthermore, if this argument is `True` and the created or updated output bag is not structurally valid, the bag will subsequently be reverted back to a normal directory and a BagValidationError exception is thrown. |
| strict | `boolean` | If `True`, automatically validate a newly created or updated bag for structural validity and fail if the resultant bag is invalid. This can be used to ensure that a bag is not persisted without payload file manifests. Furthermore, if this argument is `True` and a created output bag is not structurally valid, the bag will subsequently be reverted back to a normal directory. An updated bag will not be reverted. In either case, a BagValidationError exception is thrown. |

**Returns**: `bag` - An instantiated [bagit-python](https://github.com/LibraryOfCongress/bagit-python/blob/master/bagit.py) `bag` compatible class object.

Expand Down
Loading

0 comments on commit efc1d33

Please sign in to comment.