This repository has been archived by the owner on Sep 24, 2023. It is now read-only.
Releases: bmjcode/pywebarchive
Releases · bmjcode/pywebarchive
Version 0.5.2
Version 0.5.1
Stable release.
Fixed
- Document the function of the
WebResource.frame_name
property.
Version 0.5.0
Stable release.
Added
- More complete documentation for the
WebArchive
andWebResource
classes. - Documentation on pywebarchive's internals.
- Unit test for subresource URLs occurring as literal text.
Changed
- Massively overhaul the README.
- Improved the documentation for the
webarchive
module. - Expanded and clarified various code comments.
- Use a
with
clause for proper cleanup in test/extracted_archive_display.py. - Rename
WebArchive.extract()
'ssingle_file
argument to the more descriptiveembed_subresources
(potentially backwards-incompatible change).
Fixed
- Raise a
WebArchiveError
when attempting to extract a webarchive with no main resource. - Raise a
WebArchiveError
when attempting to convert a webarchive with no main resource to HTML. - Return the correct value for
WebArchive.resource_count()
if no main resource is present.
Removed
- The unnecessary
<!-- Processed by pywebarchive -->
tag previously added to extracted pages.
Version 0.4.1
Beta "I can't believe I missed that!" release.
In keeping with this project's long tradition of sloppiness, this release was rushed out to fix a single missing line of code that does nothing now, but whose absence would cause trouble if the function it calls is ever implemented.
Some more interesting changes happened in version 0.4.0, including the addition of context manager (with
statement) support.
Fixed
- Call
close()
inWebArchive.__exit__()
.
Version 0.4.0
Beta release.
Added
- Context manager (
with
statement) support in theWebArchive
class. - The
WebArchive.close()
method. - The
WebArchive.parent
property. - Support for the
mode
argument inwebarchive.open()
(though only read mode remains implemented).
Changed
- Further cleaned up internal APIs.
- Improved module documentation.
Fixed
- Ensure an encoding is always specified when creating a text
WebResource
. - Removed duplicated code in test/extracted_archive_display.py.
Version 0.3.3
Beta bugfix release.
Added
- Unit tests for HTML- and CSS-rewriting logic.
- Build script for the Windows version of Webarchive Extractor.
Changed
- Clean up the
WebResource
class's internal API. - Do not force a newline after the doctype in
HTMLRewriter.handle_decl()
. - Moved
test_extracted_archive_display
from the unit tests to a separate script. - Removed
test_extracted_archive_display
's dependency on Tkinter.
Fixed
- Rewrite URLs in inline CSS code when extracting.
Version 0.3.2
Beta bugfix release.
Added
- The module version number in
webarchive.__version__
. - Initial support for command-line arguments in
extractor-gui.py
. - The
--version
argument inextractor.py
andextractor-gui.py
.
Changed
- Further code cleanup.
- Give more descriptive names to various internals.
Fixed
- Support HTML subresources.
- Handle non-HTML subresources incorrectly served as
text/html
. - Update the module description in
setup.py
to match its documentation. - Specify a text encoding in
WebArchiveTest.test_webarchive_to_html()
so the test will pass on Windows. - Make
webbrowser
an optional dependency inextractor.py
to matchextractor-gui.py
.
Version 0.3.1
Beta bugfix release.
Added
- Unit test for
WebArchive.to_html()
.
Changed
- Massively expanded module documentation.
- Don't delete the
srcset
attribute from<img>
. - Embed style sheets in single-file mode using data URIs rather than
<style>
. - Cleaned up various internals.
Fixed
- Handle
srcset
entries without a width or pixel density descriptor. - Embed subresources recursively when calling
WebResource.to_data_uri()
on an archive's main resource. - Don't escape HTML entities in a
<script>
or<style>
block. - Correctly handle non-HTML main resources.
Version 0.3.0
Beta release.
Added
- Experimental support for extracting webarchives to single-file HTML documents.
- External scripts and style sheets are replaced with inline content.
- External images are embedded using data URIs.
- New command-line options for
extractor.py
:-s
/--single-file
to extract archive contents to a single HTML file.-o
/--open-page
to open the extracted webpage when finished.
- New
WebArchive
class methods:get_local_path()
returns the basename of the file created when a specified subresource is extracted.get_subframe_archive()
returns the subframe archive corresponding to a specified URL.get_subresource()
returns the subresource corresponding to a specified URL.to_html()
returns the archive's contents as a single-file HTML document.
- The
WebResource.archive
property, which identifies a given resource's parentWebArchive
. - The
WebArchiveError
exception.
Changed
- Moved the development status up to beta.
Fixed
- Correctly handle "empty" tags like
<img />
in XHTML documents. - Fixed local resource paths for extracted subframe archives.
Removed
- The
Extractor
class, included only for backwards compatibility with the poorly thought-out 0.1.0 API.
Version 0.2.4
Alpha-quality code cleanup release
- Added unit tests
- Use
webbrowser.open()
inextractor-gui.py
for improved portability