Releases · bmjcode/pywebarchive

Massively overhaul the README.
Improved the documentation for the webarchive module.
Expanded and clarified various code comments.
Use a with clause for proper cleanup in test/extracted_archive_display.py.
Rename WebArchive.extract()'s single_file argument to the more descriptive embed_subresources (potentially backwards-incompatible change).

Fixed

Raise a WebArchiveError when attempting to extract a webarchive with no main resource.
Raise a WebArchiveError when attempting to convert a webarchive with no main resource to HTML.
Return the correct value for WebArchive.resource_count() if no main resource is present.

Removed

The unnecessary  tag previously added to extracted pages.

Assets 4

26 Mar 20:17

bmjcode

v0.4.1

b1bb0bf

Version 0.4.1

Beta "I can't believe I missed that!" release.

In keeping with this project's long tradition of sloppiness, this release was rushed out to fix a single missing line of code that does nothing now, but whose absence would cause trouble if the function it calls is ever implemented.

Some more interesting changes happened in version 0.4.0, including the addition of context manager (with statement) support.

Fixed

Call close() in WebArchive.__exit__().

Assets 4

26 Mar 19:42

bmjcode

v0.4.0

c22a29a

Version 0.4.0

Beta release.

Added

Context manager (with statement) support in the WebArchive class.
The WebArchive.close() method.
The WebArchive.parent property.
Support for the mode argument in webarchive.open() (though only read mode remains implemented).

Changed

Further cleaned up internal APIs.
Improved module documentation.

Fixed

Ensure an encoding is always specified when creating a text WebResource.
Removed duplicated code in test/extracted_archive_display.py.

Assets 4

06 Nov 01:46

bmjcode

v0.3.3

958fd32

Version 0.3.3

Beta bugfix release.

Added

Unit tests for HTML- and CSS-rewriting logic.
Build script for the Windows version of Webarchive Extractor.

Changed

Clean up the WebResource class's internal API.
Do not force a newline after the doctype in HTMLRewriter.handle_decl().
Moved test_extracted_archive_display from the unit tests to a separate script.
Removed test_extracted_archive_display's dependency on Tkinter.

Fixed

Rewrite URLs in inline CSS code when extracting.

Assets 4

26 Sep 16:04

bmjcode

v0.3.2

2fb1e9c

Version 0.3.2

Beta bugfix release.

Added

The module version number in webarchive.__version__.
Initial support for command-line arguments in extractor-gui.py.
The --version argument in extractor.py and extractor-gui.py.

Changed

Further code cleanup.
Give more descriptive names to various internals.

Fixed

Support HTML subresources.
Handle non-HTML subresources incorrectly served as text/html.
Update the module description in setup.py to match its documentation.
Specify a text encoding in WebArchiveTest.test_webarchive_to_html() so the test will pass on Windows.
Make webbrowser an optional dependency in extractor.py to match extractor-gui.py.

Assets 4

25 Sep 23:14

bmjcode

v0.3.1

41f7237

Version 0.3.1

Beta bugfix release.

Added

Unit test for WebArchive.to_html().

Changed

Massively expanded module documentation.
Don't delete the srcset attribute from <img>.
Embed style sheets in single-file mode using data URIs rather than <style>.
Cleaned up various internals.

Fixed

Handle srcset entries without a width or pixel density descriptor.
Embed subresources recursively when calling WebResource.to_data_uri() on an archive's main resource.
Don't escape HTML entities in a <script> or <style> block.
Correctly handle non-HTML main resources.

Assets 4

18 Jul 15:26

bmjcode

v0.3.0

f140f78

Version 0.3.0

Beta release.

Added

Experimental support for extracting webarchives to single-file HTML documents.
- External scripts and style sheets are replaced with inline content.
- External images are embedded using data URIs.
New command-line options for extractor.py:
- -s / --single-file to extract archive contents to a single HTML file.
- -o / --open-page to open the extracted webpage when finished.
New WebArchive class methods:
- get_local_path() returns the basename of the file created when a specified subresource is extracted.
- get_subframe_archive() returns the subframe archive corresponding to a specified URL.
- get_subresource() returns the subresource corresponding to a specified URL.
- to_html() returns the archive's contents as a single-file HTML document.
The WebResource.archive property, which identifies a given resource's parent WebArchive.
The WebArchiveError exception.

Changed

Moved the development status up to beta.

Fixed

Correctly handle "empty" tags like <img /> in XHTML documents.
Fixed local resource paths for extracted subframe archives.

Removed

The Extractor class, included only for backwards compatibility with the poorly thought-out 0.1.0 API.

Assets 4

22 Feb 21:20

bmjcode

v0.2.4

dc87e46

Version 0.2.4

Alpha-quality code cleanup release

Added unit tests
Use webbrowser.open() in extractor-gui.py for improved portability

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changed

Fixed

Added

Changed

Fixed

Removed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Added

Changed

Fixed

Removed

Releases: bmjcode/pywebarchive

Version 0.5.2

Changed

Version 0.5.1

Fixed

Version 0.5.0

Added

Changed

Fixed

Removed

Version 0.4.1

Fixed

Version 0.4.0

Added

Changed

Fixed

Version 0.3.3

Added

Changed

Fixed

Version 0.3.2

Added

Changed

Fixed

Version 0.3.1

Added

Changed

Fixed

Version 0.3.0

Added

Changed

Fixed

Removed

Version 0.2.4