Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recvloop so it doens't use as much memory #1482

Closed
wants to merge 2 commits into from

Conversation

ankur22
Copy link
Collaborator

@ankur22 ankur22 commented Oct 11, 2024

What?

Avoiding working with io.ReadAll when reading off the websocket connection.

Why?

recvloop was using ws.ReadMessage, which was in turn working with io.ReadAll. It turns out for the current use case, this isn't a good way to read the message off the web socket connection. It works better to buffer the read, store it in a buffer that can increase with the test, which is then cloned before being unmarshaled. We have more control over the memory by working in this way.

When working with a simple test against a test website the memory consumption dropped from 120MB to 20MB. This memory usage drop will depend on the website under test. Websites that use a lot of high res images should see a drop in memory usage.

There is still one more thing to do try (in a new PR) to improve the memory/CPU here, which is to try a different JSON unmarshaling tool.

Checklist

  • I have performed a self-review of my code
  • I have added tests for my changes
  • I have commented on my code, particularly in hard-to-understand areas

Related PR(s)/Issue(s)

Updates: #1480

@ankur22 ankur22 force-pushed the fix/io-readall-recvloop branch from c8ee6f4 to 6617966 Compare October 11, 2024 15:36
@ankur22 ankur22 mentioned this pull request Oct 11, 2024
@ankur22 ankur22 requested a review from inancgumus October 11, 2024 15:37
Copy link
Member

@inancgumus inancgumus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that we find a solution for this tedious issue 🎉

I have suggestions.

This PR adds this:

// Setting the wsBuffer as the same size as bufio.NewReader. We should look at
// optimizing it depending on the average website, or allow it to be
// configurable, but it depends on how much this affects the user.
wsBuffer := make([]byte, 4096)
// This buffer grows as the test progresses.
var recvBuffer []byte
c.logger.Debugf("Connection:recvLoop", "wsURL:%q", c.wsURL)
for {
_, reader, err := c.conn.NextReader()
if err != nil {
c.handleIOError(err)
return
}
// Buffered reads from the websocket connection.
bufReader := bufio.NewReader(reader)
for {
n, err := bufReader.Read(wsBuffer)
if err == io.EOF {
break
}
if err != nil {
c.handleIOError(err)
return
}
recvBuffer = append(recvBuffer, bytes.Clone(wsBuffer[:n])...)
}
// Clone the bytes from the recvBuffer to buf before unmarshaling
// the message. Without this test the unmarshaling will fail as
// it reuses the buf as the underlying bytes buffer.
buf := make([]byte, len(recvBuffer))
copy(buf, recvBuffer)
// Reset recvBuffer so it can be reused.
recvBuffer = recvBuffer[:0]
c.logger.Tracef("cdp:recv", "<- %s", buf)


I believe keeping this already tricky and central part of the code simpler will serve us better in the long run.

Also, correctly using io.Reader directly is risky and highly likely to create unforeseen issues. Since our goal is to reuse the same buffer instead of allocating a new one, we can use bytes.Buffer to let it handle the io.Reader. It reuses the same internal bytes buffer. Go has bytes.Buffer for these kind of situations:

var buffer bytes.Buffer

// loop:
if _, err := buffer.ReadFrom(reader); err != nil {
    // ...
}

// Clone so that we can share the underlying bytes
buf := bytes.Clone(buffer.Bytes())

// Reuse the underlying byte slice
buffer.Reset()

// Use buf here
...

If your profiling shows that bufio.Reader is helpful, you can wrap reader with it: buffer.ReadFrom(bufio.NewReader(reader)).


💡 It also makes sense to add a Benchmark to see how much memory usage we reduce and keep it at that level.

@ankur22
Copy link
Collaborator Author

ankur22 commented Oct 11, 2024

Thanks for the suggestion @inancgumus. I agree that this is less maintainable and more error prone.

I did test with bytes.Buffer (i implemented it in the exact same way as you've shown). In my tests i found bytes.Buffer used more memory than the current implementation. I need to rerun the tests. I like the idea of using benchmark to make it clearer which is the better approach 👍

I'll do three tests:

  1. Without the fix;
  2. With bytes.buffer;
  3. With the current approach.

EDIT: Also, i think this solves one memory issue, but there's still something holding onto the memory 😕 Any idea what root is?

Screenshot 2024-10-11 at 22 43 58

EDIT2: I found where there are other leaks: #1482 (comment)

@ankur22 ankur22 marked this pull request as draft October 11, 2024 21:26
@ankur22 ankur22 force-pushed the fix/io-readall-recvloop branch 2 times, most recently from c7be235 to 00bca99 Compare October 11, 2024 22:33
common/network_manager.go Outdated Show resolved Hide resolved
@ankur22 ankur22 changed the title Fix recvloop so it doens't use as much memory DRAFT: Fix recvloop so it doens't use as much memory Oct 12, 2024
@ankur22 ankur22 force-pushed the fix/io-readall-recvloop branch from 88572e5 to 14fbdd8 Compare October 14, 2024 10:09
@ankur22 ankur22 changed the title DRAFT: Fix recvloop so it doens't use as much memory Fix recvloop so it doens't use as much memory Oct 14, 2024
// This buffer grows as the test progresses.
var recvBuffer []byte
var bufReader bufio.Reader
var buffer bytes.Buffer
Copy link
Collaborator Author

@ankur22 ankur22 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that we find a solution for this tedious issue 🎉

I have suggestions.

This PR adds this:

// Setting the wsBuffer as the same size as bufio.NewReader. We should look at
// optimizing it depending on the average website, or allow it to be
// configurable, but it depends on how much this affects the user.
wsBuffer := make([]byte, 4096)
// This buffer grows as the test progresses.
var recvBuffer []byte
c.logger.Debugf("Connection:recvLoop", "wsURL:%q", c.wsURL)
for {
_, reader, err := c.conn.NextReader()
if err != nil {
c.handleIOError(err)
return
}
// Buffered reads from the websocket connection.
bufReader := bufio.NewReader(reader)
for {
n, err := bufReader.Read(wsBuffer)
if err == io.EOF {
break
}
if err != nil {
c.handleIOError(err)
return
}
recvBuffer = append(recvBuffer, bytes.Clone(wsBuffer[:n])...)
}
// Clone the bytes from the recvBuffer to buf before unmarshaling
// the message. Without this test the unmarshaling will fail as
// it reuses the buf as the underlying bytes buffer.
buf := make([]byte, len(recvBuffer))
copy(buf, recvBuffer)
// Reset recvBuffer so it can be reused.
recvBuffer = recvBuffer[:0]
c.logger.Tracef("cdp:recv", "<- %s", buf)

I believe keeping this already tricky and central part of the code simpler will serve us better in the long run.

Also, correctly using io.Reader directly is risky and highly likely to create unforeseen issues. Since our goal is to reuse the same buffer instead of allocating a new one, we can use bytes.Buffer to let it handle the io.Reader. It reuses the same internal bytes buffer. Go has bytes.Buffer for these kind of situations:

var buffer bytes.Buffer

// loop:
if _, err := buffer.ReadFrom(reader); err != nil {
    // ...
}

// Clone so that we can share the underlying bytes
buf := bytes.Clone(buffer.Bytes())

// Reuse the underlying byte slice
buffer.Reset()

// Use buf here
...

I ran this suggested change through my test and it all checks off Perfectly 👍 a631b38

If your profiling shows that bufio.Reader is helpful, you can wrap reader with it: buffer.ReadFrom(bufio.NewReader(reader)).

This didn't make a difference, so I've left it out.

💡 It also makes sense to add a Benchmark to see how much memory usage we reduce and keep it at that level.

I've added a task in the original issue tracking all the memory related issues. I can't create a benchmark test yet as they aren't behaving as we would expect since the memory keeps going up due to the other memory leaks (investigating in #1484). Once memory leaking/usage has come under a bit more control then I think it would be best to complete the benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this suggested change through my test and it all checks off Perfectly 👍 a631b38

Nice 🎉

I've added a task in the original #1480 tracking all the memory related issues. I can't create a benchmark test yet as they aren't behaving as we would expect since the memory keeps going up due to the other memory leaks (investigating in #1484). Once memory leaking/usage has come under a bit more control then I think it would be best to complete the benchmarks.

This is fine, but benchmarks can also help with pprofing.

@ankur22 ankur22 requested a review from inancgumus October 14, 2024 14:03
@ankur22 ankur22 marked this pull request as ready for review October 14, 2024 14:03
@ankur22 ankur22 force-pushed the fix/io-readall-recvloop branch from 184556f to c791237 Compare October 14, 2024 14:15
recvloop was using ws.ReadMessage, which was in turn working with
io.ReadAll. It turns out for the current use, this isn't a good way to
read the message of the web socket connection. Instead we should buffer
the read, store it in a buffer that can increase with the test, which
is then cloned before being unmarshaled. We have more control over the
memory by working in this way.
@ankur22 ankur22 force-pushed the fix/io-readall-recvloop branch from c791237 to a631b38 Compare October 14, 2024 14:16
@ankur22 ankur22 marked this pull request as draft October 14, 2024 18:47
@ankur22
Copy link
Collaborator Author

ankur22 commented Oct 16, 2024

Closing this in favour of the actual solution to the problem #1488

@ankur22 ankur22 closed this Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants