Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of http parser #6680

Merged
merged 5 commits into from
Apr 5, 2018
Merged

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Mar 27, 2018

This patch updates the http protocol parser in packetbeat to not buffer the full request/response.

Body is only buffered when neccessary, i.e. the include_body_for option specifies that the body should be included in the event, or a request body of type form-urlencoded is sent.

Fixes #6679

Until now, the http parser buffered the whole request/response in stream.data, including the body if present, and used message.parseOffset to mark what has already been parsed. New packet's payload got appended to stream.data.

Now, the http parser discards parsed data from stream.data, saving the headers in message.rawHeaders and the body (when required) in message.body. Thus making message.parseOffset unnecessary.

A simple benchmark that processes a 10MB request.
It would not catch the case where the functionality to extract
form-urlencoded parameters is broken.
Tests were missing for the following features:
- include_body_for
- send_request
- send_response
return true, true
}

func (p *parser) shouldIncludeInBody(contenttype []byte) bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

receiver name p should be consistent with previous receiver name parser for parser

@adriansr adriansr added the in progress Pull request is currently in progress. label Mar 27, 2018
@adriansr adriansr changed the title Reduce memory usage of http parser [WIP] Reduce memory usage of http parser Mar 27, 2018
@adriansr adriansr force-pushed the feature/http_bodies branch 2 times, most recently from f2d8fd5 to 4d7ffef Compare March 29, 2018 10:34
@adriansr adriansr changed the title [WIP] Reduce memory usage of http parser Reduce memory usage of http parser Mar 29, 2018
@adriansr adriansr added review and removed in progress Pull request is currently in progress. labels Mar 29, 2018
@@ -274,7 +267,12 @@ func (http *httpPlugin) doParse(
conn.streams[dir] = st
} else {
// concatenate bytes
if len(st.data)+len(pkt.Payload) > http.maxMessageSize {
totalLength := uint64(len(st.data)) + uint64(len(pkt.Payload))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change code to cast to uint64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Originally an int64 variable was involved in that calculation, which forced the cast (msg.size). After a cleanup I didn't realize the cast was redundant.

@@ -514,6 +512,15 @@ func (http *httpPlugin) newTransaction(requ, resp *message) beat.Event {
}
}

func (http *httpPlugin) getRaw(m *message) []byte {
Copy link

@urso urso Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename this to buildRawMessage or makeRawMessage. Quite some allocation and copying is happening here.

Also consider this returning a string. When being used, the result is casted to a string. This requires another allocation + copy.

go1.10 is adding a StringBuilder. This would be a good replacement. If you don't want to wait, you could use reflect + pointer casts (this is what StringBuilder does internally):

func (m *message) makeRaw() string {
  sz := len(m.rawHeaders)
  if m.sendBody {
    sz += len(m.body)
  }
  result := make([]byte, 0, sz)  // allocate enough space once, don't grow slice
  result = append(result, m.rawHeaders...)
  if m.sendBody {
    result = append(result, m.body...)
  }
  
  // create string from allocated array
  hdr := (*reflect.SliceHeader)(unsafe.Pointer(&result))
  return *(*string)(unsafe.Pointer(&reflect.StringHeader{
    Data: hdr.Data,
    Len: hdr.Len,
  }))
}

return
}
authHeaderStartX += len(constCRLF)
authHeaderEndX := limit
Copy link

@urso urso Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the header/bodyOffset in favour of bytes.Index? They still seem to be usefull when post processing messages.

Copy link
Contributor Author

@adriansr adriansr Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of headerOffset was a mistake. bodyOffset is unnecessary as the headers and body are now stored in different slices.

m.contentLength += (len(s.data) - s.parseOffset)
s.parseOffset = len(s.data)
if m.saveBody {
m.body = append(m.body, s.data...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance len(body) > http header content length after this append? E.g. another HTTP message starting right after the current one due to HTTP keep-alive/pipelining.

Copy link
Contributor Author

@adriansr adriansr Apr 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this particular append, as it is only called in the case where the length is not known and connection is close or http/1.0 is used, which rules out keep-alive/pipelining.

@urso
Copy link

urso commented Apr 4, 2018

It's a many small, but subtle changes. Most notable is the removal of parseOffset in favour of accessing s.data directly. Can you add a short comment to the PR description why it's removed and how this changes parsing (I guess the s.data slice is always advanced by number of consumed bytes).

if m.sendBody {
result = append(result, m.body...)
}
// TODO: (go1.10) Use strings.Builder to avoid allocation/copying
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a followup github issue?

Copy link

@urso urso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe add a changelog.

@adriansr adriansr force-pushed the feature/http_bodies branch from a5f287d to 2e4342d Compare April 5, 2018 13:28
This patch updates the http protocol parser in packetbeat to not
buffer the full request/response.

Body is only buffered when neccesary, i.e. the `include_body_for`
option specifies that the body should be included in the event, or
a request body of type form-urlencoded is sent.

Fixes elastic#6679
@adriansr adriansr force-pushed the feature/http_bodies branch from 2e4342d to fc771f2 Compare April 5, 2018 14:20
@urso urso merged commit f4a367e into elastic:master Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants