Releases: iipc/jwarc
v0.21.0: Release 0.21.0
New features:
- WarcRevisit Builder with String targetURI #68 (Robert van Loenhout)
Bug fixes:
- WarcWriter.fetch() now uses the TCP nodelay socket option
Release 0.20.1
Changes:
-
WarcWriter.fetch() now calculates block and payload SHA-1 digests
-
The version() setter on HttpMessage and WarcRecord builders now enforces that the passed version has the expected protocol ("HTTP" and "WARC" respectively)
v0.20.0: Release 0.20.0
New features:
-
WarcRevisit.Builder.refersTo() now accepts a String for targetURI #66 (Robert van Loenhout)
-
Added --save-ca-certificate to recorder tool.
-
Certificates issued by the recording proxy now include the subjectAltName extension to satisfy clients with stricter validation.
v0.19.0: Release 0.19.0
New features
-
jwarc will now attempt to leniently parse HTTP messages with Transfer-Encoding: chunked but where the body does not begin with a valid chunk header by assuming the body is not actually chunked encoded. This improves compatibility with tools like Browsertrix that strip chunked encoding but leave the HTTP header in place.
-
ExtractTool will now extract multiple records when given multiple offsets
-
CdxTool gained support for the 'N' (normalized SURT) field
-
CdxTool gained partial support for pywb's method of encoding request bodies in CDX records. This is still a work in progress and not yet fully compatible with pywb in all cases.
v0.18.1: Release 0.18.1
Bugs fixed
- WarcReader.position(long) would not reset the GunzipChannel's buffer correctly causing an exception or the wrong record to be returned after seeking in a gzipped WARC.
v0.18.0: Release 0.18.0
New features
-
New cdx package for reading and writing CDX files
-
Added --format option to the cdx tool
-
Added a basic dedupe tool that can deduplicate records against a CDX server (such as OutbackCDX)
-
Added a stats tool that prints counts and sizes records by status, type, host
-
WarcReader now has a .position(long) method for seeking to the start of a particular record (if the underlying channel supports seeking)
Bugs fixed
-
HttpRequest/Response.serializeHeader() now returns the exact original bytes for records read from a WarcReader. This means the extract tool no longer reformats extracted HTTP headers.
-
Fixed an issue where null record ids would be written as "" instead of omitting the appropriate header. Notably this means revisit records can be constructed with WARC-Refers-To-Target-URI and WARC-Refers-To-Date but without WARC-Refers-To.
v0.17.2: Release 0.17.2
Bugs fixed
- Fixed 'pushback would result in negative position' exception when parsing HTTP messages missing CRLF at the end of the headers
v0.17.1: Release 0.17.1
Bug fixes
- The lenient HTTP parser now accepts messages missing the terminating CRLFs at the end of the headers (provided there is no message body).
0.17.0
v0.16.5: Release 0.16.5
Release 0.16.5
Bug fixed:
- ValidateTool: fix infinite loop and invalid digest calculation due to incorrect buffer handling
Sorry no native binaries this time, I haven't gotten around to re-implementing the build process for them as a Github Action.