Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch files not listed in payload manifest #99

Closed
ThomasJejkal opened this issue Jan 12, 2018 · 5 comments
Closed

Fetch files not listed in payload manifest #99

ThomasJejkal opened this issue Jan 12, 2018 · 5 comments
Labels

Comments

@ThomasJejkal
Copy link
Contributor

Hi,

I'm using bagit-java (5.0.3) and today I've reached a point where I'm wondering if the library is breaking (a little bit) with the BagIt standard. Let's try to formulate my problem in 'given-when-then' style. ;-)

Given

  • I plan to create a bag, which has to be partly fetched by the consumer
  • I've added one payload manifest containing one item
  • I've added two fetch items

When

  • I write the bag to disk

Then

  • The fetch items appear in 'fetch.txt'
  • The payload item appears in 'manifest-.txt'
  • None of the fetch items appear in 'manifest-.txt'

Obviously, it is also not possible to provide (manually) hashes for files that should be fetched by the bag consumer. According to the example in the BagIt specification (see https://tools.ietf.org/html/draft-kunze-bagit-08#section-5.2) fetch items are listed in 'manifest-.txt'.

Are there plans or is there the possibility to fix this behaviour or was it implemented in that way on purpose?

Regards,
Thomas

@johnscancella
Copy link
Contributor

Hi Thomas,

Thanks for submitting this. You are correct that the specification regarding the fetch component does not allow for hashes and that it needs to be put in the manifest file. I don't use the fetch feature that much so it doesn't get as much testing as it probably should. I took a look at the code, and there needs to be some logic added to PayloadWriter that checks for FetchItems and doesn't add them to the data directory. That way you can safely add your fetch items to the manifest.

johnscancella added a commit that referenced this issue Jan 12, 2018
@johnscancella
Copy link
Contributor

This should be fixed in release 5.0.4

@johnscancella
Copy link
Contributor

@ThomasJejkal
Copy link
Contributor Author

Hi John,

Thanks for adding the fix that fast. I've tested the new version but there still seems to be an issue as during "real usage" (outside the test world) the check for fetch paths does not work. I guess the reason are problems while dealing with relative paths, but let's go back to 'given-when-then' style:

Given

  • I add a fetch file with path Paths.get("default/out.html") and a payload element with path Paths.get("/Users/jejkal/NetBeansProjects/RepoInteropTool/testBag/data/default/out.html"). I have to do it that way, because this is the only way to make the FetchFile check in PayloadWriter working as it searches for existing paths relative to the bag data dir.

When

  • I write the bag to disk

Then

  • manifest-md5.txt contains the correct entry with relative path data/default/out.html, as the payload path added before contains the absolute path which can be successfully relativized to the bag root.
  • fetch.txt contains the wrong entry with relative path ../default/out.html, as the relative path used for the FetchFile is assumed to be relative to the working directory, which is in my case located at '../' This, relativizing to the bag root cannot work as expected.

All other combinations, e.g. using Paths.get("data/default/out.html") or Paths.get("/Users/jejkal/NetBeansProjects/RepoInteropTool/testBag/data/default/out.html") as FetchFile path cause PayloadWriter not to identify the payload as FetchFile as 'fetchPaths.contains(relativePayloadPath.normalize())' assumes to have paths relative to the bag data dir in the list.

In the test work everything works fine as in PayloadTestWriterTest#testWritePayloadFilesMinusFetchFiles() you are providing 'rootDir' as argument 'bagDataDir' whereas PayloadWriter#writeVersionDependentPayloadFiles(final Bag bag, final Path outputDir) uses bag.getRootDir().resolve("data")

Thanks for your help.

Regards,
Thomas

@johnscancella
Copy link
Contributor

Thanks for finding this. Now whatever path you give in the fetch item should be what it uses in the fetch.txt. This change will be in the 5.0.5 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants