Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.discover fails on whitespace #7

Open
dgtlmoon opened this issue Aug 1, 2018 · 3 comments
Open

.discover fails on whitespace #7

dgtlmoon opened this issue Aug 1, 2018 · 3 comments

Comments

@dgtlmoon
Copy link

dgtlmoon commented Aug 1, 2018

Other project https://github.com/fcurella/python-datauri/ like has no problems (pip3 install python-datauri)

FAIL
<img src="data:image/jpeg;base64, iVBORw0KGgoAAAANSUhEUgAgglpyJYUQnIAABCJJggg==">

PASS
<img src="data:image/jpeg;base64,iVBORw0KGgoAAAANSUhEUgAgglpyJYUQnIAABCJJggg==">

@wbolster
Copy link
Collaborator

wbolster commented Aug 2, 2018

hi, thanks for your contribution.

according to rfc 2397 the syntax is:

       dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
       mediatype  := [ type "/" subtype ] *( ";" parameter )
       data       := *urlchar
       parameter  := attribute "=" value

...which means that whitespace is not allowed.

the data: uri page on wikipedia contains:

In this example, the lines are broken for formatting purposes. In actual URIs, including data URIs, control characters (ASCII 0 to 31, and 127) and spaces (ASCII 32) are "excluded characters". This means that whitespace characters are not permitted in data URIs.

though it continues like this:

However, in the context of HTML 4 and HTML 5, linefeeds within an element attribute value (such as the "src" above) are ignored[citation needed]. So the data URI above would be processed ignoring the linefeeds, giving the correct result. But note that this is an HTML feature, not a data URI feature, and in other contexts, it is not possible to rely on whitespace within the URI being ignored.

i am not sure what the best behaviour would be here.

(note: the behaviour of another library is not leading. that other library also seems to reparse on every attribute access, which this library will never do.)

@ghost
Copy link

ghost commented Aug 2, 2018

It seems to me like it should be safe to strip leading/trailing whitespace if the data is base64-encoded. I would be in support of doing that for convenience's sake. Other data types may care about whitespace.

@wbolster
Copy link
Collaborator

from #8 and #8 (comment)

i personally think code should either be strict (no whitespace, per spec) or loose (strip all whitespace, useful for html-ish data), not something in between.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants