Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix message parsing for documents containing Content-Length keyword #80

Merged
merged 9 commits into from
Sep 3, 2019

Conversation

dinvlad
Copy link
Contributor

@dinvlad dinvlad commented Aug 24, 2019

Description

This PR fixes the logic for parsing a batch of messages in data_received(), where each message body may contain a literal Content-Length keyword.

Previously, if a document contained this keyword, then the server would crash, because it attempted to split messages on this keyword.

Now, instead of splitting the messages on the keyword, it parses content length from the first keyword, and then extracts the message based on the length. After that, the parsing loop is repeated, until no more messages can be read.

Update: I added handling of partial messages, since some messages may arrive in chunks.

Code review checklist (for code reviewer to complete)

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see this for details)
  • Tests have been included and/or updated, as appropriate
  • Docstrings have been included and/or updated, as appropriate
  • Standalone docs have been updated accordingly
  • CONTRIBUTORS.md was updated, as appropriate
  • Changelog has been updated, as needed (see CHANGELOG.md)

This commit fixes the logic for parsing a batch of messages
in data_received(), where each message body may contain
a literal 'Content-Length' keyword.

Previously, if a document contained this keyword,
the the server would crash, because it attempted
to split messages on this keyword.

Now, instead of splitting the messages on the keyword,
it parses content length from the first keyword,
and then extracts the message based on the length.
After that, the parsing loop is repeated, until
no more messages can be read.
@danixeee danixeee self-requested a review August 24, 2019 16:23
@danixeee danixeee added bug Something isn't working enhancement New feature or request labels Aug 24, 2019
This commit enables receival of partial messages,
where the entire message may arrive in chunks.

We achieve this by maintaining _message_buf [],
which builds the message as the parts arrive,
until the entire message (as specified in its
Content-Length) can be parsed.

Then, the loop continues for any other messages
that may arrive together with the first one.

The updated regex allows extra headers in the message.

Additionally, we change aio_readline() logic
to always pass the entire message (with headers)
to data_received(). This allows us to avoid
extraneous logic than required otherwise.

Finally, we add tests for this functionality.
@danixeee
Copy link
Contributor

@dinvlad Thanks for the PR, great work! Is it ready for the review or you plan to make more changes?

@dinvlad
Copy link
Contributor Author

dinvlad commented Aug 29, 2019

I think it's ready, thanks!

CONTRIBUTORS.md Outdated
@@ -5,3 +5,4 @@
- [Max O'Cull](https://github.com/Maxattax97)
- [Tomoya Tanjo](https://github.com/tom-tan)
- [yorodm](https://github.com/yorodm)
- [Denis Loginov](https://github.com/dinvlad)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this alphabetical?

@@ -397,18 +397,43 @@ def connection_made(self, transport: asyncio.Transport):
"""Method from base class, called when connection is established"""
self.transport = transport

MESSAGE_PATTERN = re.compile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this up where other class constants are (alphabetically)?

Copy link
Contributor

@danixeee danixeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address two very minor (formatting) comments and I will merge it.

This is much better and robust then before, thank you again for submitting this PR!

Edit:

Could you please update the CHANGELOG?

@dinvlad
Copy link
Contributor Author

dinvlad commented Sep 3, 2019

Awesome, thanks for reviewing! I've made the changes.

@danixeee
Copy link
Contributor

danixeee commented Sep 3, 2019

I have tested everything and it works. I needed to test json-extension after installing from .vsix, since it is using IO connection instead of TCP that way.

I merged another PR today, can you just resolve conflicts and include link to your PR?

CHANGELOG should look like this:

- Fix parsing of partial messages and those with Content-Length keyword ([#80])
- Fix Full SyncKind for servers accepting Incremental SyncKind ([#78])

[#80]: https://github.com/openlawlibrary/pygls/pull/80
[#78]: https://github.com/openlawlibrary/pygls/pull/78

@dinvlad
Copy link
Contributor Author

dinvlad commented Sep 3, 2019

Sounds good - resolved, please take a look.

Copy link
Contributor

@danixeee danixeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the PR @dinvlad!

@danixeee danixeee merged commit 0d97f79 into openlawlibrary:master Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants