-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix message parsing for documents containing Content-Length keyword #80
Conversation
This commit fixes the logic for parsing a batch of messages in data_received(), where each message body may contain a literal 'Content-Length' keyword. Previously, if a document contained this keyword, the the server would crash, because it attempted to split messages on this keyword. Now, instead of splitting the messages on the keyword, it parses content length from the first keyword, and then extracts the message based on the length. After that, the parsing loop is repeated, until no more messages can be read.
This commit enables receival of partial messages, where the entire message may arrive in chunks. We achieve this by maintaining _message_buf [], which builds the message as the parts arrive, until the entire message (as specified in its Content-Length) can be parsed. Then, the loop continues for any other messages that may arrive together with the first one. The updated regex allows extra headers in the message. Additionally, we change aio_readline() logic to always pass the entire message (with headers) to data_received(). This allows us to avoid extraneous logic than required otherwise. Finally, we add tests for this functionality.
@dinvlad Thanks for the PR, great work! Is it ready for the review or you plan to make more changes? |
I think it's ready, thanks! |
CONTRIBUTORS.md
Outdated
@@ -5,3 +5,4 @@ | |||
- [Max O'Cull](https://github.com/Maxattax97) | |||
- [Tomoya Tanjo](https://github.com/tom-tan) | |||
- [yorodm](https://github.com/yorodm) | |||
- [Denis Loginov](https://github.com/dinvlad) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this alphabetical?
pygls/protocol.py
Outdated
@@ -397,18 +397,43 @@ def connection_made(self, transport: asyncio.Transport): | |||
"""Method from base class, called when connection is established""" | |||
self.transport = transport | |||
|
|||
MESSAGE_PATTERN = re.compile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this up where other class constants are (alphabetically)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address two very minor (formatting) comments and I will merge it.
This is much better and robust then before, thank you again for submitting this PR!
Edit:
Could you please update the CHANGELOG?
Awesome, thanks for reviewing! I've made the changes. |
I have tested everything and it works. I needed to test json-extension after installing from I merged another PR today, can you just resolve conflicts and include link to your PR? CHANGELOG should look like this: - Fix parsing of partial messages and those with Content-Length keyword ([#80])
- Fix Full SyncKind for servers accepting Incremental SyncKind ([#78])
[#80]: https://github.com/openlawlibrary/pygls/pull/80
[#78]: https://github.com/openlawlibrary/pygls/pull/78 |
Sounds good - resolved, please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for the PR @dinvlad!
Description
This PR fixes the logic for parsing a batch of messages in
data_received()
, where each message body may contain a literalContent-Length
keyword.Previously, if a document contained this keyword, then the server would crash, because it attempted to split messages on this keyword.
Now, instead of splitting the messages on the keyword, it parses content length from the first keyword, and then extracts the message based on the length. After that, the parsing loop is repeated, until no more messages can be read.
Update: I added handling of partial messages, since some messages may arrive in chunks.
Code review checklist (for code reviewer to complete)