Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero-copy serialization APIs. #357

Merged
merged 20 commits into from
Dec 12, 2024
Merged

Add zero-copy serialization APIs. #357

merged 20 commits into from
Dec 12, 2024

Conversation

teo-tsirpanis
Copy link
Member

SC-58009

This PR updates the Buffer and BufferList types to implement the io.WriterTo interface, by writing the data directly to an io.Writer without an intermediate copy to Go-backed memory. For Buffer we do that with the Unsafe.Slice function (it is safe as long as the Writer follows the contract and does not retain the slice after the call to Write), and for BufferList we call WriteTo for each buffer.

In order to take advantage of this, new serialization APIs were added that return a Buffer instead of a byte[] (and unavoidably had to make a copy to Go-managed memory). The old serialization APIs, as well as the Buffer.Serialize() and BufferList.Flatten() functions were deprecated.

@teo-tsirpanis teo-tsirpanis requested a review from a team as a code owner November 26, 2024 17:05
Copy link
Collaborator

@shaunrd0 shaunrd0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some small comments but LGTM, nice work!

buffer.go Outdated
// Keep the buffer alive during the write, to prevent the GC from
// collecting the memory while it's being used.
n, err := w.Write(unsafe.Slice((*byte)(cbuffer), writeSize))
runtime.KeepAlive(b)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since b is the object that implements this method the keepalive isn't needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. I was being extra cautious, but if we call b.WriteTo(w), it becomes clear that b is being kept somewhere reachable.

serialize.go Outdated
return &buffer, nil
}

// SerializeArrayNonEmptyDomainToBuffer gets and serializes the array nonempty domain and returns a Buffer object containing the payload.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SerializeArrayNonEmptyDomainToBuffer gets and serializes the array nonempty domain and returns a Buffer object containing the payload.
// SerializeArrayNonEmptyDomain gets and serializes the array nonempty domain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for catching it.

serialize.go Outdated Show resolved Hide resolved
}
assert.NotEmpty(t, bytes.Bytes())
runtime.GC()
runtime.GC()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the purpose of the last few lines in this test, but I see there was one like it just below so I'm probably missing something.

Should there be a check here that buffer was garbage collected, or is the test verifying something else with the explicit calls to runtime.GC()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the GC calls from the existing test, I don't know what their purpose was.

buffer_list.go Outdated Show resolved Hide resolved
@ypatia ypatia requested a review from a team November 27, 2024 08:47
Copy link
Member

@ypatia ypatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this very useful optimization, great job! One question, is this Core PR a prerequisite? If yes let's mark this one as Draft so that we don't accidentally merge it before Go has bundled to a Core version that includes that change.

@teo-tsirpanis
Copy link
Member Author

Strictly speaking no, this PR can be merged without waiting for the Core PR. The Core PR is a prerequisite for fully taking advantage of the APIs added in this PR in the REST server (and even this might not be true, I have thought of a potential workaround).

@teo-tsirpanis teo-tsirpanis requested a review from a team November 27, 2024 09:49
Copy link
Collaborator

@NullHypothesis NullHypothesis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat work, thanks for this! Left a few comments.

// Because io.Writer supports writing up to 2GB of data at a time, we have to use a loop
// for the bigger buffers.
for remaining > 0 {
// TODO: Use min on Go 1.21+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably no harm in updating TileDB-Go's Go version. Let's address this in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will bump when I do SC-58351 which needs 1.21+.

buffer.go Outdated Show resolved Hide resolved
buffer.go Outdated Show resolved Hide resolved
buffer.go Outdated Show resolved Hide resolved
availableBytes := uint64(csize) - uint64(off)
var sizeToRead int
if availableBytes > math.MaxInt {
sizeToRead = math.MaxInt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we capping sizeToRead because of copy() returning an integer? If so, we may want to use io.Copy() instead because it returns an int64. In fact, I don't know if the built-in copy function even uses the io.WriterTo interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how io.Copy() is relevant here as we don't have neither a Reader nor a Writer available. See also my other comment about the choice of int and the type of slice lengths.

buffer.go Outdated
}

// Construct a slice from the buffer's data without copying it.
// Keep the buffer alive during the write, to prevent the GC from
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use runtime.KeepAlive here, like we do in other parts of the code? Not sure if it's necessary though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either if a keep alive is needed, see also my SO question. I removed it after #357 (comment), but missed the comment which I have now removed as well.


written := int64(0)

for i := uint(0); i < uint(nbuffs); i++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While unlikely, technically this cast could overflow, no? nbuffs is uint64 and uint(nbuffs) may be uint32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBuffer() unfortunately accepts a uint.

buffer_test.go Show resolved Hide resolved
buffer_test.go Outdated Show resolved Hide resolved
serialize.go Outdated Show resolved Hide resolved
@teo-tsirpanis
Copy link
Member Author

Latest changes validated in the REST server (see CI of PR 5103, failures look unrelated).

After writing the last slice, we produced UB by incrementing cbuffer to one byte past the end of the buffer.
Now we keep `cbuffer` immutable (only `remaining` gets mutated) and increment it in the statement that calls `w.Write()`.
Now the test passes locally with `-d=checkptr=2`.
@ypatia
Copy link
Member

ypatia commented Dec 12, 2024

We have tested this change in Core REST-CI and it is green. We have also successfully run SOMA Benchmarks using this PR and https://github.com/TileDB-Inc/TileDB-Cloud-REST/pull/5103 and have demonstrated the perf improvements.
I think it is now that testing has been completed and comments have been addressed, it is safe to get merged.

@ypatia ypatia merged commit 9a66fdf into master Dec 12, 2024
10 checks passed
@teo-tsirpanis teo-tsirpanis deleted the teo-buffer-zero-copy branch December 12, 2024 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants