Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed fetch issue #103

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 59 additions & 59 deletions src/main/java/gov/loc/repository/bagit/writer/PayloadWriter.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,73 +21,73 @@
* Responsible for writing out the bag payload to the filesystem
*/
public final class PayloadWriter {
private static final Logger logger = LoggerFactory.getLogger(PayloadWriter.class);
private static final Version VERSION_2_0 = new Version(2, 0);
private static final ResourceBundle messages = ResourceBundle.getBundle("MessageBundle");
private static final Logger logger = LoggerFactory.getLogger(PayloadWriter.class);
private static final Version VERSION_2_0 = new Version(2, 0);
private static final ResourceBundle messages = ResourceBundle.getBundle("MessageBundle");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this file only contains formatting changes, is that correct?

private PayloadWriter(){
//intentionally left empty
}
/*
//intentionally left empty
}

/*
* Write the payload files in the data directory or under the root directory depending on the version
*/
*/
static Path writeVersionDependentPayloadFiles(final Bag bag, final Path outputDir) throws IOException{
Path bagitDir = outputDir;
//@Incubating
Path bagitDir = outputDir;
//@Incubating
if(bag.getVersion().isSameOrNewer(VERSION_2_0)){
bagitDir = outputDir.resolve(".bagit");
Files.createDirectories(bagitDir);
writePayloadFiles(bag.getPayLoadManifests(), bag.getItemsToFetch(), outputDir, bag.getRootDir());
}
bagitDir = outputDir.resolve(".bagit");
Files.createDirectories(bagitDir);
writePayloadFiles(bag.getPayLoadManifests(), bag.getItemsToFetch(), outputDir, bag.getRootDir());
}
else{
final Path dataDir = outputDir.resolve("data");
Files.createDirectories(dataDir);
writePayloadFiles(bag.getPayLoadManifests(), bag.getItemsToFetch(), dataDir, bag.getRootDir().resolve("data"));
final Path dataDir = outputDir.resolve("data");
Files.createDirectories(dataDir);
writePayloadFiles(bag.getPayLoadManifests(), bag.getItemsToFetch(), dataDir, bag.getRootDir().resolve("data"));
}

return bagitDir;
}

return bagitDir;
}

/**
* Write the payload <b>file(s)</b> to the output directory
*
* @param payloadManifests the set of objects representing the payload manifests
* @param fetchItems the list of items to exclude from writing in the output directory because they will be fetched
* @param outputDir the data directory of the bag
* @param bagDataDir the data directory of the bag
*
* @throws IOException if there was a problem writing a file
*/
public static void writePayloadFiles(final Set<Manifest> payloadManifests, final List<FetchItem> fetchItems, final Path outputDir, final Path bagDataDir) throws IOException{
logger.info(messages.getString("writing_payload_files"));
final Set<Path> fetchPaths = getFetchPaths(fetchItems);


/**
* Write the payload <b>file(s)</b> to the output directory
*
* @param payloadManifests the set of objects representing the payload manifests
* @param fetchItems the list of items to exclude from writing in the output directory because they will be fetched
* @param outputDir the data directory of the bag
* @param bagDataDir the data directory of the bag
*
* @throws IOException if there was a problem writing a file
*/
public static void writePayloadFiles(final Set<Manifest> payloadManifests, final List<FetchItem> fetchItems, final Path outputDir, final Path bagDataDir) throws IOException {
logger.info(messages.getString("writing_payload_files"));
final Set<Path> fetchPaths = getFetchPaths(fetchItems, bagDataDir);

for(final Manifest payloadManifest : payloadManifests){
for(final Path payloadFile : payloadManifest.getFileToChecksumMap().keySet()){
final Path relativePayloadPath = bagDataDir.relativize(payloadFile);
final Path relativePayloadPath = bagDataDir.relativize(payloadFile);

if(fetchPaths.contains(relativePayloadPath.normalize())) {
logger.info(messages.getString("skip_fetch_item_when_writing_payload"), payloadFile);
}
logger.info(messages.getString("skip_fetch_item_when_writing_payload"), payloadFile);
}
else {
final Path writeToPath = outputDir.resolve(relativePayloadPath);
logger.debug(messages.getString("writing_payload_file_to_path"), payloadFile, writeToPath);
final Path parent = writeToPath.getParent();
if(parent != null){
Files.createDirectories(parent);
}
Files.copy(payloadFile, writeToPath, StandardCopyOption.COPY_ATTRIBUTES, StandardCopyOption.REPLACE_EXISTING);
}
}
}
}
private static Set<Path> getFetchPaths(final List<FetchItem> fetchItems){
final Set<Path> fetchPaths = new HashSet<>();
for(final FetchItem fetchItem : fetchItems) {
fetchPaths.add(fetchItem.getPath());
}
return fetchPaths;
}
final Path writeToPath = outputDir.resolve(relativePayloadPath);
logger.debug(messages.getString("writing_payload_file_to_path"), payloadFile, writeToPath);
final Path parent = writeToPath.getParent();
if(parent != null){
Files.createDirectories(parent);
}
Files.copy(payloadFile, writeToPath, StandardCopyOption.COPY_ATTRIBUTES, StandardCopyOption.REPLACE_EXISTING);
}
}
}
}

private static Set<Path> getFetchPaths(final List<FetchItem> fetchItems, final Path bagDataDir){
final Set<Path> fetchPaths = new HashSet<>();
fetchItems.forEach((fetchItem) -> {
fetchPaths.add(bagDataDir.relativize(bagDataDir.getParent().resolve(fetchItem.getPath())));
});
return fetchPaths;
}
}
32 changes: 16 additions & 16 deletions src/main/resources/MessageBundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -23,30 +23,30 @@ different_case=The bag contains two files that differ only in case. This can cau
different_normalization=The bag contains two files that differ only in the normalization. This can cause verification to fail on some systems, and general user confusion.
extra_lines_in_bagit_files=The bagit specification says it must only contain 2 lines. However, some implementations have decided to ignore this which may cause compatibility issues
leading_dot_slash=A manifest lists all data files as relative to the bag root directory, it is superfluous to therefore specify it with a dot.
non_standard_algorithm=The checksum algorithm used does not come standard with the Java runtime. Consider using SHA512 instead.
md5sum_tool_generated_manifest=The manifest was created using a using checksum utilities such as those contained in the GNU Coreutils package (md5sum, sha1sum, etc.), collectively referred to here as 'md5sum'. This creates slight differences in generated manifests that can cause problems in some implementations.
missing_tag_manifest=The tag manifest guards against a truncated payload manifest as well as other potental problems and is always recommened that it be included.
non_standard_algorithm=The checksum algorithm used does not come standard with the Java runtime. Consider using SHA-512 instead.
md5sum_tool_generated_manifest=The manifest was created using checksum utilities such as those contained in the GNU Coreutils package (md5sum, sha1sum, etc.), collectively referred to here as 'md5sum'. This creates slight differences in generated manifests that can cause problems in some implementations.
missing_tag_manifest=The tag manifest guards against a truncated payload manifest as well as other potential problems and is always recommened that it be included.
old_bagit_version=The bagit specification version is not the newest. Consider converting to the latest version.
os_specific_files=Files created by the operating system (OS) for its own use. They are non-portable across OS versions and should not be included in any manifest. Examples Thumbs.db on Windows or .DS_Store on OS X
payload_oxum_missing=It is recommended to always include the Payload-Oxum in the bag metadata since it allows for a 'quick verification' of the bag.
tag_files_encoding=It is recommended to always use UTF-8.
weak_checksum_algorithm=The checksum algorithm used is known to be weak. We recommend using SHA512.
weak_checksum_algorithm=The checksum algorithm used is known to be weak. We recommend using SHA-512.

#for BagLinter.java
checking_encoding_problems=Checking encoding problems.
checking_latest_version=checking for latest version.
checking_manifest_problems=checking manifests for problems.
checking_metadata_problems=checking bag metadata for problems.
skipping_check_extra_lines=skipping check for extra lines in bagit files.
checking_extra_lines=checking if [{}] contains more than 2 lines.
checking_latest_version=Checking for latest version.
checking_manifest_problems=Checking manifests for problems.
checking_metadata_problems=Checking bag metadata for problems.
skipping_check_extra_lines=Skipping check for extra lines in bagit files.
checking_extra_lines=Checking if [{}] contains more than 2 lines.
extra_lines_warning=The bagit specification states that the bagit.txt file must contain exactly 2 lines. However we found [{}] lines, some implementations will ignore this but may cause incompatibility issues with other tools.

#for BagProfileChecker.java
checking_fetch_file_allowed=Checking if the fetch file is allowed for bag [{}].
checking_metadata_entry_required=Checking if [{}] is required in the bag metadata.
check_values_acceptable=Checking if all the values listed for [{}] are acceptable.
check_required_manifests_present=Checking if all the required manifests are present.
required_tag_manifest_type_not_present=Required tagmanifest type [{}] was not present.
required_tag_manifest_type_not_present=Required tag manifest type [{}] was not present.
required_manifest_type_not_present=Required manifest type [{}] was not present.
checking_required_tag_file_exists=Checking if all the required tag files exist.

Expand Down Expand Up @@ -74,7 +74,7 @@ different_case_warning=In manifest [{}], path [{}] is the same as another path e
manifest_line_violated_spec_error=Manifest contains line [{}] which does not follow the specified form of <CHECKSUM> <PATH>
md5sum_generated_line_warning=Path [{}] starts with a *, which means it was generated with a non-bagit tool. It is recommended to remove the * in order to conform to the bagit specification.
cannot_access_parent_path_error=Could not access parent folder of [{}].
different_normalization_in_manifest_warning=File [{}] has a different normalization then what is specified in the manifest.
different_normalization_in_manifest_warning=File [{}] has a different normalization than what is specified in the manifest.
bag_within_bag_warning=We stronger recommend not storing a bag within a bag as it is known to cause problems.
leading_dot_slash_warning=In manifest [{}] line [{}] is a non-normalized path.
os_specific_files_warning=In manifest [{}] line [{}] contains a OS specific file.
Expand Down Expand Up @@ -139,7 +139,7 @@ found_metadata_file=Found metadata file [{}].

#for TagFileReader.java
removing_asterisk=Encountered path that was created by non-bagit tool. Removing * from path. Please remove all * from manifest files!
blackslash_used_as_path_separator_error=[{}] is invalid due to the use of the path separactor [\\]!
blackslash_used_as_path_separator_error=[{}] is invalid due to the use of the path separator [\\]!
malicious_path_error=[{}] is trying to be malicious and access a file outside the bag!
invalid_url_format_error=URL [{}] is invalid!

Expand All @@ -161,7 +161,7 @@ checking_checksums=Checking file [{}] to see if checksum matches [{}].
corrupt_checksum_error=File [{}] is suppose to have a [{}] hash of [{}] but was computed [{}].

#for FileCoundAndTotalSizeVisitor.java
file_size_in_bytes=File [{}] hash a size of [{}] bytes.
file_size_in_bytes=File [{}] has a size of [{}] bytes.

#for MandatoryVerifier.java
checking_fetch_items_exist=Checking if all [{}] items in fetch.txt exist in the [{}] directory.
Expand All @@ -186,11 +186,11 @@ checking_file_in_at_least_one_manifest=Checking if all payload files (files in [
checking_file_in_all_manifests=Checking if all payload files (files in [{}] directory) are listed in all manifests.

#for QuickVerifier.java
found_payload_oxum=Found payload-oxum [{}] for bag [{}].
found_payload_oxum=Found Payload-Oxum [{}] for bag [{}].
payload_oxum_missing_error=Payload-Oxum does not exist in bag!
parse_size_in_bytes=Parsing [{}] for the total byte size of the payload oxum.
parse_number_of_files=Parsing [{}] for the number of files to find in the payload directory.
compare_payload_oxums=supplied payload-oxum: [{}], Calculated payload-oxum: [{}.{}], for payload directory [{}].
compare_payload_oxums=Supplied payload-oxum: [{}], Calculated payload-oxum: [{}.{}], for payload directory [{}].
invalid_total_size_error=Invalid total size. Expected [{}] but calculated [{}]!
invalid_file_cound_error=Invalid file count. Expected [{}] but found [{}]!

Expand All @@ -200,7 +200,7 @@ writing_line_to_file=Writing line [{}] to [{}]

#for BagWriter.java
writing_payload_files=Writing payload files.
upsert_payload_oxum=Upserting payload-oxum.
upsert_payload_oxum=Inserting payload-oxum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, upserting is correct. It stands for inserting or updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry and thanks for extending my English vocabulary. 👍

Copy link
Contributor

@johnscancella johnscancella Jan 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . And thank you for catching all my english mistakes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who are the intended audience for this message? If not, I'd be inclined to use something like “updating” since “upsert” is generally only familiar to people who work with databases a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a DEBUG message, which I generally wrote for other programmers.

writing_bagit_file=Writing the bagit.txt file.
writing_payload_manifests=Writing the payload manifest(s).
writing_bag_metadata=Writing the bag metadata.
Expand Down
Loading