Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BagLinter always adds 'different_case' warning #119

Open
rvanheest opened this issue Jul 7, 2018 · 3 comments · May be fixed by #125
Open

BagLinter always adds 'different_case' warning #119

rvanheest opened this issue Jul 7, 2018 · 3 comments · May be fixed by #125

Comments

@rvanheest
Copy link

I just upgraded to v1.2.0 of this library. With the upgraded stuff on BagIt v1.0 in #118, a new bug has entered the library:

When using the BagLinter, the warning different_case will always be shown. Given it's description:

The bag contains two files that differ only in case. This can cause problems on a filesystem like the one used by apple (HFS).

I would not expect this to happen on a bag with only very distinct files.

I think the bug is as follows: in the code below a manifest file is read and for each line the path is added to the Set paths (after being converted to lowercase).

try(final BufferedReader reader = Files.newBufferedReader(manifestFile, encoding)){
final Set<String> paths = new HashSet<>();
String line = reader.readLine();
while(line != null){
String path = parsePath(line);
path = checkForManifestCreatedWithMD5SumTools(path, warnings, warningsToIgnore);
paths.add(path.toLowerCase());
checkForDifferentCase(path, paths, manifestFile, warnings, warningsToIgnore);
if(encoding.name().startsWith("UTF")){
checkNormalization(path, manifestFile.getParent(), warnings, warningsToIgnore);
}
checkForBagWithinBag(line, warnings, warningsToIgnore, isPayloadManifest);
checkForRelativePaths(line, warnings, warningsToIgnore, manifestFile);
checkForOSSpecificFiles(line, warnings, warningsToIgnore, manifestFile);
line = reader.readLine();
}
}

Immediately after that checkForDifferentCase is called, which checks whether paths.contains(path.toLowerCase()). Of course, this is always true, since path was just added to paths before this check.

/*
* Check that the same line doesn't already exist in the set of paths
*/
private static void checkForDifferentCase(final String path, final Set<String> paths, final Path manifestFile,
final Set<BagitWarning> warnings, final Collection<BagitWarning> warningsToIgnore){
if(!warningsToIgnore.contains(BagitWarning.DIFFERENT_CASE) && paths.contains(path.toLowerCase())){
logger.warn(messages.getString("different_case_warning"), manifestFile, path);
warnings.add(BagitWarning.DIFFERENT_CASE);
}
}

acdha added a commit to acdha/bagit-java that referenced this issue Jul 9, 2018
…ess#119)

This ensures that the test won’t flag the file it just added
for any case where the filename isn’t already lowercased.
acdha added a commit to acdha/bagit-java that referenced this issue Jul 9, 2018
…ess#119)

This ensures that the test won’t flag the file it just added
for any case where the filename isn’t already lowercased.
@acdha
Copy link
Member

acdha commented Jul 9, 2018

I created #120 simply to do the check first. Today has been rather busy so a second set of eyes on the code would be appreciated.

@rvanheest
Copy link
Author

@acdha From reasoning through the code this looks fine. But I wasn't able to test it with the code change in my own project, since I can't find a way to build this project to my local Maven repository directory. I keep hitting errors with the current gradle configuration.

What surprises me a bit is that no tests we added or changed. I was looking and running ManifestCheckerTest and saw that (even after the change was applied), DIFFERENT_CASE still showed up in the test results. I'm not sure if this is correct or not (you guys know much more about the whole context than me, of course), but it made me wonder whether this change is all there is to it.

@jscancella
Copy link
Contributor

jscancella commented Jul 12, 2018

@acdha I originally created this to catch potential errors like
HASH /data/someFile and HASH /data/SomeFile which on mac is the same file, but on other systems should be two different files. With that in mind, it should be giving a warning that they differ only by case.

After looking at this more, it looks like you are correct and this is indeed from adding it to the paths too early.

@jscancella jscancella linked a pull request Jul 22, 2018 that will close this issue
rvanheest pushed a commit to DANS-KNAW/dans-bagit-lib that referenced this issue Feb 20, 2019
rvanheest pushed a commit to DANS-KNAW/dans-bagit-lib that referenced this issue Feb 20, 2019
…with bagit-python. Also added tests for valid bags of each standard algorithm
rvanheest pushed a commit to DANS-KNAW/dans-bagit-lib that referenced this issue Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants