-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make sure the closing boundary is written when calling 'write' on an empty MultipartWriter #3520
make sure the closing boundary is written when calling 'write' on an empty MultipartWriter #3520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice found! There is a question about the right form of generated data for this case, propose to discuss that (:
@@ -848,6 +848,11 @@ class CustomReader(aiohttp.MultipartReader): | |||
b'\r\n') == bytes(buf)) | |||
|
|||
|
|||
async def test_writer_write_no_parts(buf, stream, writer) -> None: | |||
await writer.write(stream) | |||
assert b'--:--\r\n' == bytes(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be just closed boundary or pair of open-close ones. Need to check how others handles this case. Python's MIMEMultipart does the former for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a pair of open-close would mean a single empty part
just a close boundary means no parts
they are two different things, so i think this is the right approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are different, but that's how stdlib parser acts on empty list and that's what RFC defines about multipart:
multipart-body := preamble 1*encapsulation
close-delimiter epilogue
encapsulation := delimiter CRLF body-part
delimiter := CRLF "--" boundary ; taken from Content-Type
field.
; when content-type is
multipart
; There must be no space
; between "--" and boundary.
close-delimiter := delimiter "--" ; Again, no space before
"--"
preamble := *text ; to be ignored upon
receipt.
epilogue := *text ; to be ignored upon
receipt.
body-part = <"message" as defined in RFC 822,
with all header fields optional, and with the
specified delimiter not occurring anywhere in
the message body, either on a line by itself
or as a substring anywhere. Note that the
semantics of a part differ from the semantics
of a message, as described in the text.>
Encapsulation with open boundary is required, otherwise it becomes malformed data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we didn't have any issues of bad multipart reading because no data, but I guess we could meet such one. That's why I'm worry about compatibility of this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no preference here.
Guys, do you know how other libraries work?
RFC compliance sounds a bold argument for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most up to date RFC is https://tools.ietf.org/html/rfc2046, but I agree it specifies that there must be at least one body part.
I would argue however that this RFC does not allow representing a multipart body with no body part, so we can't follow the RFC in this case: adding an open-close boundary would be parsed as "single empty body part" by any server - which is of course the expected behaviour - so imho it is not an option here.
I definitely agree with @asvetlov that aiohttp should mimic existing librairies here:
- urllib3 (and requests) serialise empty body with a lone closing boundary: https://github.com/urllib3/urllib3/blob/master/src/urllib3/filepost.py#L63
- the
email.message
module in the stdlib does indeed follow the RFC: https://github.com/python/cpython/blob/master/Lib/email/generator.py#L291-L304 . But the consequence of this is that a "serialize-parse" loop with the stdlib implementation is not a no-op…
21690e6
to
5b2ff76
Compare
Codecov Report
@@ Coverage Diff @@
## master #3520 +/- ##
==========================================
- Coverage 97.94% 97.18% -0.76%
==========================================
Files 44 44
Lines 8562 8560 -2
Branches 1381 1379 -2
==========================================
- Hits 8386 8319 -67
- Misses 71 122 +51
- Partials 105 119 +14
Continue to review full report at Codecov.
|
…empty MultipartWriter
5b2ff76
to
0ca99fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Should we backport it to 3.5?
We don't need the backport to 3.5, so I guess we can skip this part, and reconsider if anyone resurface the issue? Do you want me to add a NEWS entry? feature or bugfix? |
…on an empty MultipartWriter (#3520) (cherry picked from commit 0ad5d90) Co-authored-by: Arthur Darcet <[email protected]>
@arthurdarcet the PR fails a master with 1 crashed test: https://travis-ci.com/aio-libs/aiohttp/jobs/199387254 |
#3758 provides a fix |
…on an empty MultipartWriter (#3520) (#3756) (cherry picked from commit 0ad5d90) Co-authored-by: Arthur Darcet <[email protected]>
The MultipartWriter class produces an empty body when it represents an "empty" payload (ie no parameters are present).
If I'm not mistaken, this is not a valid way to represent an empty payload, and it actually breaks the MultipartReader: the Content-Type header should be set even for an empty payload (with a boundary), and the "end of message" token
--BOUNDARY--\r\n
should still be in the bodyFor instance, this client
will crash this server:
The proposed change makes the MultipartWriter output the end-of-message token even if the payload is empty.
I've added a regression test (that fails against master, so that's another way to reproduce what my example above does)
The PR also modifies the other
web_functionnal
tests to remove the explicitheaders
argument, AFAIK the documented way of using MultipartWriter is not to manually copy the headers.Let me know if this change is acceptable, and if you consider this a feature, bugfix, … (ie how should I document this)
Thanks!