Skip to content
This repository has been archived by the owner on Sep 24, 2023. It is now read-only.

Releases: bmjcode/pywebarchive

Version 0.5.2

24 Sep 15:26
Compare
Choose a tag to compare

Final release.

Changed

  • Improved handling of empty attribute values (<img alt="">) and valueless attributes (<iframe seamless>).

Version 0.5.1

08 Oct 22:16
Compare
Choose a tag to compare

Stable release.

Fixed

  • Document the function of the WebResource.frame_name property.

Version 0.5.0

16 Apr 17:25
Compare
Choose a tag to compare

Stable release.

Added

  • More complete documentation for the WebArchive and WebResource classes.
  • Documentation on pywebarchive's internals.
  • Unit test for subresource URLs occurring as literal text.

Changed

  • Massively overhaul the README.
  • Improved the documentation for the webarchive module.
  • Expanded and clarified various code comments.
  • Use a with clause for proper cleanup in test/extracted_archive_display.py.
  • Rename WebArchive.extract()'s single_file argument to the more descriptive embed_subresources (potentially backwards-incompatible change).

Fixed

  • Raise a WebArchiveError when attempting to extract a webarchive with no main resource.
  • Raise a WebArchiveError when attempting to convert a webarchive with no main resource to HTML.
  • Return the correct value for WebArchive.resource_count() if no main resource is present.

Removed

  • The unnecessary <!-- Processed by pywebarchive --> tag previously added to extracted pages.

Version 0.4.1

26 Mar 20:17
Compare
Choose a tag to compare

Beta "I can't believe I missed that!" release.

In keeping with this project's long tradition of sloppiness, this release was rushed out to fix a single missing line of code that does nothing now, but whose absence would cause trouble if the function it calls is ever implemented.

Some more interesting changes happened in version 0.4.0, including the addition of context manager (with statement) support.

Fixed

  • Call close() in WebArchive.__exit__().

Version 0.4.0

26 Mar 19:42
Compare
Choose a tag to compare

Beta release.

Added

  • Context manager (with statement) support in the WebArchive class.
  • The WebArchive.close() method.
  • The WebArchive.parent property.
  • Support for the mode argument in webarchive.open() (though only read mode remains implemented).

Changed

  • Further cleaned up internal APIs.
  • Improved module documentation.

Fixed

Version 0.3.3

06 Nov 01:46
Compare
Choose a tag to compare

Beta bugfix release.

Added

  • Unit tests for HTML- and CSS-rewriting logic.
  • Build script for the Windows version of Webarchive Extractor.

Changed

  • Clean up the WebResource class's internal API.
  • Do not force a newline after the doctype in HTMLRewriter.handle_decl().
  • Moved test_extracted_archive_display from the unit tests to a separate script.
  • Removed test_extracted_archive_display's dependency on Tkinter.

Fixed

  • Rewrite URLs in inline CSS code when extracting.

Version 0.3.2

26 Sep 16:04
Compare
Choose a tag to compare

Beta bugfix release.

Added

  • The module version number in webarchive.__version__.
  • Initial support for command-line arguments in extractor-gui.py.
  • The --version argument in extractor.py and extractor-gui.py.

Changed

  • Further code cleanup.
  • Give more descriptive names to various internals.

Fixed

  • Support HTML subresources.
  • Handle non-HTML subresources incorrectly served as text/html.
  • Update the module description in setup.py to match its documentation.
  • Specify a text encoding in WebArchiveTest.test_webarchive_to_html() so the test will pass on Windows.
  • Make webbrowser an optional dependency in extractor.py to match extractor-gui.py.

Version 0.3.1

25 Sep 23:14
Compare
Choose a tag to compare

Beta bugfix release.

Added

  • Unit test for WebArchive.to_html().

Changed

  • Massively expanded module documentation.
  • Don't delete the srcset attribute from <img>.
  • Embed style sheets in single-file mode using data URIs rather than <style>.
  • Cleaned up various internals.

Fixed

  • Handle srcset entries without a width or pixel density descriptor.
  • Embed subresources recursively when calling WebResource.to_data_uri() on an archive's main resource.
  • Don't escape HTML entities in a <script> or <style> block.
  • Correctly handle non-HTML main resources.

Version 0.3.0

18 Jul 15:26
Compare
Choose a tag to compare

Beta release.

Added

  • Experimental support for extracting webarchives to single-file HTML documents.
    • External scripts and style sheets are replaced with inline content.
    • External images are embedded using data URIs.
  • New command-line options for extractor.py:
    • -s / --single-file to extract archive contents to a single HTML file.
    • -o / --open-page to open the extracted webpage when finished.
  • New WebArchive class methods:
    • get_local_path() returns the basename of the file created when a specified subresource is extracted.
    • get_subframe_archive() returns the subframe archive corresponding to a specified URL.
    • get_subresource() returns the subresource corresponding to a specified URL.
    • to_html() returns the archive's contents as a single-file HTML document.
  • The WebResource.archive property, which identifies a given resource's parent WebArchive.
  • The WebArchiveError exception.

Changed

  • Moved the development status up to beta.

Fixed

  • Correctly handle "empty" tags like <img /> in XHTML documents.
  • Fixed local resource paths for extracted subframe archives.

Removed

  • The Extractor class, included only for backwards compatibility with the poorly thought-out 0.1.0 API.

Version 0.2.4

22 Feb 21:20
Compare
Choose a tag to compare

Alpha-quality code cleanup release

  • Added unit tests
  • Use webbrowser.open() in extractor-gui.py for improved portability