Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix payload digest for chunked response in test warc #97

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nlevitt
Copy link
Contributor

@nlevitt nlevitt commented Oct 25, 2019

exposes #74

@nlevitt
Copy link
Contributor Author

nlevitt commented Oct 25, 2019

sha1 computed like so:

$ warcio extract --payload test/data/example-iana.org-chunked.warc 405 | sha1sum
8846f23ce943a3b70089f86345626778cd93f11e  -

@nlevitt nlevitt changed the title fix payload digest for chunked response fix payload digest for chunked response in test waerc Oct 25, 2019
@nlevitt nlevitt changed the title fix payload digest for chunked response in test waerc fix payload digest for chunked response in test warc Oct 25, 2019
@nlevitt
Copy link
Contributor Author

nlevitt commented Oct 25, 2019

Sorry I don't have a fix for the bug :)

@wumpus
Copy link
Collaborator

wumpus commented Oct 26, 2019

Bug reports lacking a fix are welcome, but pull requests are more for things that don't break the build! Also, can you please respond to the thread in #74 as to why an extremely disruptive change should be made that invalidates all digests compute befored the change? Any actual fix is going to need to support old-style-but-valid and new-style-and-valid digests.

@nlevitt
Copy link
Contributor Author

nlevitt commented Oct 29, 2019

Also, can you please respond to the thread in #74 as to why an extremely disruptive change should be made that invalidates all digests compute befored the change?

I didn't think there was any controversy about interpretation of the warc spec. The first comment on #74 spells it out clearly and concisely (thanks @JustAnotherArchivist).

This change doesn't affect "all digests". It only affects records with tranfer-encoded payloads. I fail to see how this is "extremely disruptive"? It's not about old-style vs new-style. The fact that we're here debating this is ironic because the test warc that this pull request applies to was written by warcprox. The original warcprox code had this bug in payload digest calculation and I fixed the bug in 2017.

There are lots of warcs out there with incorrect payload digests (on chunked http responses), and lots of warcs out with correct payload digests. Given that, we should follow the spec going forward.

Bug reports lacking a fix are welcome, but pull requests are more for things that don't break the build!

Fair enough. The failing test could be commented out until the fix is in place, I suppose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants