Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for base64 encoded images #43

Merged
merged 1 commit into from
Dec 10, 2019

Conversation

francoisfreitag
Copy link
Contributor

Allows constructing documents with images that are not available publicly over the Internet.

@jdufresne
Copy link
Contributor

Can you explain the advantage of this approach over supporting base64 data URLs?

https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

@francoisfreitag
Copy link
Contributor Author

francoisfreitag commented Dec 4, 2019

I did not think of using base64 to encode the image, that offers the advantage of using existing mechanisms rather than implementing a custom one, with its set of issues such as uses the data-html2docx-image attribute, and checking the image presence in the imagesdict.

The only advantage I see to accepting BytesIO is that it avoids converting back and forth to base64. I prefer to build data URLs support.

@francoisfreitag francoisfreitag force-pushed the private_images branch 2 times, most recently from ed47bef to acf9f4c Compare December 4, 2019 14:37
@francoisfreitag francoisfreitag changed the title Accept images from BytesIO Add support for base64 encoded images Dec 4, 2019
@francoisfreitag francoisfreitag force-pushed the private_images branch 3 times, most recently from fb8c724 to 7d154ae Compare December 4, 2019 14:51
@francoisfreitag
Copy link
Contributor Author

Now supports base64 encoded images.

Copy link
Contributor

@jdufresne jdufresne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

html2docx/image.py Outdated Show resolved Hide resolved
@@ -30,7 +50,7 @@ def load_image(src: str) -> io.BytesIO:
# Read up to MAX_IMAGE_SIZE when response does not contain
# the Content-Length header. The extra byte avoids an extra read to
# check whether the EOF was reached.
data = response.read(MAX_IMAGE_SIZE + 1)
data = cast(bytes, response.read(MAX_IMAGE_SIZE + 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're interested, we could contribute to typeshed to avoid this type cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this option. urlopen() returns either a HTTPResponse or an addinfourl. Both implement a read() that returns bytes. The return type used to be specified as the Union of both types, but returning a Union type is discouraged and the urlopen() was changed to return type Any.

An option might be to use structural subtyping and create a Protocol, so that methods common to addinfourl and HTTPResponse are hinted correctly. That would cover the read() method and probably a few others, but adds a maintenance burden.

@francoisfreitag francoisfreitag force-pushed the private_images branch 2 times, most recently from e146726 to 2b2a949 Compare December 6, 2019 11:35
@francoisfreitag
Copy link
Contributor Author

Changes:

  • Only check for RFC_2397_BASE64 in the header. To be even stricter, look for substring ;base64,, as the grammar defines that parameter is only allowed right before the comma. The suggested patch would error when there was no comma in the data url. Add test_inline_base64_marker_in_data(): to cover suggested case and test_inline_missing_comma.

https://tools.ietf.org/html/rfc2397

Reviewed-by: Jon Dufresne
@francoisfreitag francoisfreitag merged commit cd23bf5 into erezlife:master Dec 10, 2019
@francoisfreitag francoisfreitag deleted the private_images branch December 10, 2019 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants