-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DB serialization support #132
base: master
Are you sure you want to change the base?
Conversation
I was about to send a PR to expose the same functions! IMHO, bundling the schema name and the serialization does not make much sense, though. The schema name is a context-specific choice, not a property of the serialization. For example, one can serialize the |
|
||
// Reopens the database as in-memory representation of given serialized bytes. | ||
// The given *Serialized instance should remain referenced (i.e. not GC'd) for | ||
// the life of the DB since the bytes within are referenced directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not something the caller can guarantee, short of importing go4.org/unsafe/assume-no-moving-gc.
Generally, it's probably not a good idea to let C and Go memory cross the boundaries like this. It adds complexity including the need for the finalizers, and the invalid state of Serialized values after being used.
In Serialize, I'd always make a copy into Go memory unless SQLITE_SERIALIZE_NOCOPY is set. That way there is never a need for freeing the memory. If an application can't stand copies for performance reasons, they can use SQLITE_SERIALIZE_NOCOPY. Otherwise, they can probably tolerate two as well as one.
In Deserialize, I would always copy the input into sqlite3_malloc64'd memory, and force SQLITE_DESERIALIZE_FREEONCLOSE on. This is unfortunate in terms of memory usage, especially when using //go:embed which is my use case, and basically precludes the use of the szBuf > szDb feature, but it is the only safe API because we can't rely on Go memory staying still.
The good news is that this removes any need for the Serialized type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, yeah all my machinations here were to avoid copies, but it sounds like I can't have a stable pointer. So I guess we have to have copies.
I did this for a side project (https://github.com/cretz/temporal-sdk-go-advanced/tree/main/temporalsqlite for SQLite on https://temporal.io/) so it may be a few days until I can get to simplifying. If needing sooner or you have the bandwidth, feel free to steal the impl here and simplify w/ always-copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually it might be worth waiting for https://go.dev/issue/46787. I wonder if we can provide the same API I describe above, and implement it with copies in Go 1.18, and with Pinner in Go 1.19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golang/go#46787 has landed.
Agreed. |
// Bytes returns the serialized bytes. Do not mutate this value. This is only | ||
// valid for the life of its receiver and should be copied for any other | ||
// longer-term use. | ||
func (s *Serialized) Bytes() []byte { return s.data } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer more robust API semantics for this. This is too error prone. Assume the user of this API is not going to read your docs. How can we make it fool proof. Consider that there is virtually no case where someone is going to instantiate a Serialized
object and not copy the data. So why bother with the intermediary? Why even have a Serialized object at all?
IMO the ideal API should just accept a Writer where the serialized data is written, or accept a Reader where the serialized data can be read from so it can be instantiated into the database.
if len(s.data) > 0 && s.shouldFreeData { | ||
s.shouldFreeData = false | ||
s.sqliteOwnsData = false | ||
C.sqlite3_free(unsafe.Pointer(&s.data[0])) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complex logic with no comments. Why is this necessary?
// the database. | ||
// | ||
// https://www.sqlite.org/c3ref/serialize.html | ||
func (conn *Conn) Serialize(schema string, flags ...SerializeFlags) *Serialized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accept a writer and write the copied serialized data to it.
const ( | ||
SQLITE_DESERIALIZE_FREEONCLOSE DeserializeFlags = C.SQLITE_DESERIALIZE_FREEONCLOSE | ||
SQLITE_DESERIALIZE_RESIZEABLE DeserializeFlags = C.SQLITE_DESERIALIZE_RESIZEABLE | ||
SQLITE_DESERIALIZE_READONLY DeserializeFlags = C.SQLITE_DESERIALIZE_READONLY | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too low level to expose in the Go API here. When we deserialize we are taking go memory with the serialized DB and malloc-ing a space of C memory for which sqlite can operate an in-memory database. I think the only option we should expose is READONLY and maybe RESIZEABLE. I think since we are managing the C memory, we should control FREEONCLOSE, which we should probably set, so we don't have to continue to manage that memory.
// The Serialized parameter should no longer be used after this call. | ||
// | ||
// https://www.sqlite.org/c3ref/deserialize.html | ||
func (conn *Conn) Deserialize(s *Serialized, flags ...DeserializeFlags) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend this function signature:
func (conn *Conn) Deserialize(s *Serialized, flags ...DeserializeFlags) error { | |
func (conn *Conn) Deserialize(serialized io.Reader, size, additional int, flags ...DeserializeFlags) error { |
And then use size+additional to malloc the C memory and then copy the contents of serialized into that memory.
Set FREEONCLOSE so we don't have to continue to manage that C memory.
Added support for https://www.sqlite.org/c3ref/serialize.html and https://www.sqlite.org/c3ref/deserialize.html.