-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zero-copy serialization APIs. #357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small comments but LGTM, nice work!
buffer.go
Outdated
// Keep the buffer alive during the write, to prevent the GC from | ||
// collecting the memory while it's being used. | ||
n, err := w.Write(unsafe.Slice((*byte)(cbuffer), writeSize)) | ||
runtime.KeepAlive(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since b
is the object that implements this method the keepalive isn't needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. I was being extra cautious, but if we call b.WriteTo(w)
, it becomes clear that b
is being kept somewhere reachable.
serialize.go
Outdated
return &buffer, nil | ||
} | ||
|
||
// SerializeArrayNonEmptyDomainToBuffer gets and serializes the array nonempty domain and returns a Buffer object containing the payload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SerializeArrayNonEmptyDomainToBuffer gets and serializes the array nonempty domain and returns a Buffer object containing the payload. | |
// SerializeArrayNonEmptyDomain gets and serializes the array nonempty domain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks for catching it.
} | ||
assert.NotEmpty(t, bytes.Bytes()) | ||
runtime.GC() | ||
runtime.GC() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the purpose of the last few lines in this test, but I see there was one like it just below so I'm probably missing something.
Should there be a check here that buffer
was garbage collected, or is the test verifying something else with the explicit calls to runtime.GC()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied the GC calls from the existing test, I don't know what their purpose was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this very useful optimization, great job! One question, is this Core PR a prerequisite? If yes let's mark this one as Draft so that we don't accidentally merge it before Go has bundled to a Core version that includes that change.
Strictly speaking no, this PR can be merged without waiting for the Core PR. The Core PR is a prerequisite for fully taking advantage of the APIs added in this PR in the REST server (and even this might not be true, I have thought of a potential workaround). |
It will make some stuff easier in the REST server.
fcfc6ff
to
38155e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat work, thanks for this! Left a few comments.
// Because io.Writer supports writing up to 2GB of data at a time, we have to use a loop | ||
// for the bigger buffers. | ||
for remaining > 0 { | ||
// TODO: Use min on Go 1.21+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's probably no harm in updating TileDB-Go's Go version. Let's address this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will bump when I do SC-58351 which needs 1.21+.
availableBytes := uint64(csize) - uint64(off) | ||
var sizeToRead int | ||
if availableBytes > math.MaxInt { | ||
sizeToRead = math.MaxInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we capping sizeToRead
because of copy()
returning an integer? If so, we may want to use io.Copy()
instead because it returns an int64
. In fact, I don't know if the built-in copy
function even uses the io.WriterTo
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how io.Copy()
is relevant here as we don't have neither a Reader
nor a Writer
available. See also my other comment about the choice of int
and the type of slice lengths.
buffer.go
Outdated
} | ||
|
||
// Construct a slice from the buffer's data without copying it. | ||
// Keep the buffer alive during the write, to prevent the GC from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use runtime.KeepAlive
here, like we do in other parts of the code? Not sure if it's necessary though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure either if a keep alive is needed, see also my SO question. I removed it after #357 (comment), but missed the comment which I have now removed as well.
|
||
written := int64(0) | ||
|
||
for i := uint(0); i < uint(nbuffs); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While unlikely, technically this cast could overflow, no? nbuffs
is uint64 and uint(nbuffs)
may be uint32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBuffer()
unfortunately accepts a uint
.
6bf65e3
to
417820c
Compare
Latest changes validated in the REST server (see CI of PR 5103, failures look unrelated). |
After writing the last slice, we produced UB by incrementing cbuffer to one byte past the end of the buffer. Now we keep `cbuffer` immutable (only `remaining` gets mutated) and increment it in the statement that calls `w.Write()`. Now the test passes locally with `-d=checkptr=2`.
We have tested this change in Core REST-CI and it is green. We have also successfully run SOMA Benchmarks using this PR and https://github.com/TileDB-Inc/TileDB-Cloud-REST/pull/5103 and have demonstrated the perf improvements. |
SC-58009
This PR updates the
Buffer
andBufferList
types to implement theio.WriterTo
interface, by writing the data directly to anio.Writer
without an intermediate copy to Go-backed memory. ForBuffer
we do that with theUnsafe.Slice
function (it is safe as long as theWriter
follows the contract and does not retain the slice after the call toWrite
), and forBufferList
we callWriteTo
for each buffer.In order to take advantage of this, new serialization APIs were added that return a
Buffer
instead of abyte[]
(and unavoidably had to make a copy to Go-managed memory). The old serialization APIs, as well as theBuffer.Serialize()
andBufferList.Flatten()
functions were deprecated.