Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with MultipartReader #2302

Open
thehesiod opened this issue Oct 4, 2017 · 15 comments
Open

Issue with MultipartReader #2302

thehesiod opened this issue Oct 4, 2017 · 15 comments
Milestone

Comments

@thehesiod
Copy link
Contributor

I'm trying to mock the google API batch endpoint which takes a multipart request. It ends up sending a request like this:

--===============0167281537324890874==
Content-Type: application/http
MIME-Version: 1.0
Content-Transfer-Encoding: binary
Content-ID: <ad845523-bc65-4f77-bfad-2b8709570634+1>

GET /gmail/v1/users/me/messages/660cf5e1784d4d98b8d3adeabebb413e?format=metadata&alt=json HTTP/1.1
Content-Type: application/json
MIME-Version: 1.0
accept: application/json
accept-encoding: gzip, deflate
user-agent: google-api-python-client/1.6.2 (gzip)
Authorization: Bearer TOKEN
Host: 192.168.16.6:63191


--===============0167281537324890874==--

with headers:

{'Host': '192.168.16.6:63191', 'Content-Length': '553', 'content-type': 'multipart/mixed; boundary="===============0167281537324890874=="', 'authorization': 'Bearer TOKEN', 'user-agent': 'Python-httplib2/0.10.3 (gzip)', 'accept-encoding': 'gzip, deflate'}

And with a handler like the following:

    async def gmail_batch(self, req: web.Request):
        reader = await req.multipart()
        async for part in reader:
            async for data in part:
                print()

I get the exception:

  File "/Users/amohr/dev/Operations/email_security/svc_tests/mock_google_svcs.py", line 116, in gmail_batch
    async for data in part:
  File "/Users/amohr/.pyenv/versions/3.6.2/lib/python3.6/site-packages/aiohttp/multipart.py", line 235, in __anext__
    part = yield from self.next()
  File "/Users/amohr/.pyenv/versions/3.6.2/lib/python3.6/site-packages/aiohttp/multipart.py", line 242, in next
    item = yield from self.read()
  File "/Users/amohr/.pyenv/versions/3.6.2/lib/python3.6/site-packages/aiohttp/multipart.py", line 261, in read
    data.extend((yield from self.read_chunk(self.chunk_size)))
  File "/Users/amohr/.pyenv/versions/3.6.2/lib/python3.6/site-packages/aiohttp/multipart.py", line 279, in read_chunk
    chunk = yield from self._read_chunk_from_stream(size)
  File "/Users/amohr/.pyenv/versions/3.6.2/lib/python3.6/site-packages/aiohttp/multipart.py", line 322, in _read_chunk_from_stream
    assert self._content_eof < 3, "Reading after EOF"
AssertionError: Reading after EOF

seems like it's not correctly determining the boundary

@thehesiod
Copy link
Contributor Author

looks like in my example the lines were separated by \n's, and not \r\n

@thehesiod
Copy link
Contributor Author

thehesiod commented Oct 4, 2017

I've traced this back to the fact that the default linesep for the email module in python is '\n': https://github.com/python/cpython/blob/master/Lib/email/_policybase.py#L163 used by https://github.com/google/google-api-python-client/blob/master/googleapiclient/http.py#L1328 since policy=None. So the question is should aiohttp check for '\n' + boundary or `\r\n' + boundary as it currently does.

@thehesiod
Copy link
Contributor Author

in particular the issue is on this line: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/multipart.py#L324

@asvetlov
Copy link
Member

asvetlov commented Oct 5, 2017

https://tools.ietf.org/html/rfc7578 explicitly states CRLF as a separator.
We could support LF also though but I not sure.
Guys, opinions?

@kxepal
Copy link
Member

kxepal commented Oct 5, 2017

Only if optional to be explicitly enable by user.

From Python docs about linesep:

It defaults to ``\n`` because that is the most useful value for Python application code (other library packages expect ``\n`` separated lines).  ``linesep=\r\n`` can be used to generate output with RFC-compliant line separators.

I think we should stay with RFC by default.

@thehesiod
Copy link
Contributor Author

thehesiod commented Oct 5, 2017 via email

@kxepal
Copy link
Member

kxepal commented Oct 5, 2017

I don't think this check will change everything. We must follow the specifications in all the questions. That's the only way to make sure we're doing everything right.

However, we have to admit that life is...complicated and legacy still exists and backward compatibility with ancients still have to be preserved by major providers etc. etc. For such cases, if they are really worth to be supported (and if they cannot fix their code due to same BC), well, some fix may have to be implemented, but as opt-in feature.

Even not all browsers correctly support multipart spec, so I think some room of customization is possible here.

@asvetlov asvetlov added this to the 3.0 milestone Oct 19, 2017
@asvetlov asvetlov modified the milestones: 3.0, 3.1 Feb 12, 2018
@asvetlov asvetlov modified the milestones: 3.1, 3.2 Mar 22, 2018
@asvetlov
Copy link
Member

asvetlov commented May 4, 2018

The issue is still waiting for a hero.
We need to accept separator argument in both multipart reader and writer (\r\n by default).
The change is not very hard but requires some amount of work.

@thehesiod
Copy link
Contributor Author

btw, I'd like to be able to optionally support both styles, so perhaps a boolean instead of specify if should allow for both?

@asvetlov
Copy link
Member

asvetlov commented May 4, 2018

Do you mean a heuristic for detection what style should be used by the reader, \r\n or just \n?
Writer should know the style on construction.

@thehesiod
Copy link
Contributor Author

yes, ala splitlines. There should be a mode where the several can accept either way, that's what the google servers do for example. I bet apache as well.

@asvetlov
Copy link
Member

asvetlov commented May 4, 2018

@thehesiod would you work on implementation?
The first version can be done without autodetection.

@thehesiod
Copy link
Contributor Author

ya I'll give it a shot

@asvetlov
Copy link
Member

asvetlov commented May 4, 2018

Cool!

@asvetlov asvetlov modified the milestones: 3.2, 3.3 May 7, 2018
@webknjaz
Copy link
Member

Nota bene

https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3 (but it's obsolete since June 2014 by RFC7230):

19.3 Tolerant Applications
...
The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.
...

(https://stackoverflow.com/a/5757349/595220)

https://tools.ietf.org/html/rfc7231#section-3.1.1.3:

3.1.1.3. Canonicalization and Text Defaults
...
An HTTP sender MAY generate, and a recipient MUST be able to parse, line breaks in text media that consist of CRLF, bare CR, or bare LF.... This flexibility regarding line breaks applies only to text within a representation that has been assigned a "text" media type; it does not apply to "multipart" types or HTTP elements outside the payload body (e.g., header fields).
...
3.1.1.4. Multipart Types
...
The message body is itself a protocol element; a sender MUST generate only CRLF to represent line breaks between body parts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants