Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defstruct macro #14

Draft
wants to merge 43 commits into
base: master
Choose a base branch
from
Draft

Add defstruct macro #14

wants to merge 43 commits into from

Conversation

rutenkolk
Copy link
Contributor

@rutenkolk rutenkolk commented Oct 13, 2024

[Note: I would consider this a draft PR, as I have not yet added tests (which could unearth some bugs and necessitate appropriate fixes).]

Hi, this PR contains the addition of a defstruct macro. It does the following:

  • It adds serialize and deserialize code to a serde "registry" for the new type (details below)
  • It generates a new type that has the specified members (details below)
  • It adds an implmentation for c-layout
  • It adds inline implementations for both deserialize-from and serialize-into
  • It adds an implementation for clojure.pprint/simple-dispatch

serde registry

The "registry" is implemented via the multimethods generate-deserialize and generate-serialize which produce code to de/serialize the respective types. This removes indirection in the de/serialize code for types that use other types. i think in the original discussion we were on the same page, but thought the other meant something different. The defstruct macro adds implementations to the multimethods for the newly generated type.

the generated type

The new type is generated via deftype in the private function generate-struct-record. This is an attempt to strike a middle ground between the two positions of the original discussion, although the result might be a bit odd:

  • The type implements both IPersistentVector and IPersistentMap.
    • The basic idea is: if it is treated like a vector, it behaves like a vector. if it is treated like a map, it behaves like a map.
  • It therefore implements both vector-like methods like nth as well as map-like methods like without (for e.g. dissoc).
  • If there is a an overlap in map/vector interface such as with assoc, it supports both paradigms of indices-as-keys and membernames-as-keys. Practically speaking, if you use something like assoc with a number as a key, it behaves like a vector (and will return a vector), otherwise like a map (and will return a map).
  • one notable exception here is foreach which can't support both paradigms, and it is therefore implemented as if it's a vector. The rationale here is that the value of the type is composed of the actual values of the members, not the associated names of the places of the values. If you map or reduce over an object of this type, you will do so over the values of the members.

with-c-layout

There was one implementation problem. Since padding was needed to be taken into account to allow for inline serdes, the new code for the macro needed to rely on with-c-layout. The problem here is that with-c-layout is in the layout namespace which already depends on mem. As a stopgap solution i simply copied the function over as a private function. I would be in favor of actually deprecating the layout namespace. for backwards compatibility the with-c-layout function in layout could depend on the one in mem. Not only is the layout namespace at this point somewhat anemic, it has also caused me trouble. I'm not sure if it's a bug, but due to with-c-layout being in layout, i ran into the problem that there are now two different :padding keywords, which i found confusing.

tests & benchmarks

No tests or benchmarks exist right now. I don't expect the custom type to be slower than defrecord, but I want to test it.
Similarly, i do want to add a first set of tests for the de/serialization.

@rutenkolk rutenkolk marked this pull request as draft October 13, 2024 21:19
@IGJoshua
Copy link
Owner

I absolutely love this. You've done a fantastic job! I look forward to seeing the tests that you add for this, and I'm also thinking ahead to reorganizing some of the existing serde code to use the new generate-serialize and generate-deserialize to add some inline arities to serialize and deserialize when the type argument is a constant. Don't worry about any of that in this PR, it's just something I want to use this for in the future.

@IGJoshua
Copy link
Owner

IGJoshua commented Oct 14, 2024

So I like the way you've chosen to introduce the type registry. It integrates well with the existing tools, and provides a way to do inline arities for serialize and deserialize in the future. One hesitation I have at the moment looking over the code is the use of ::mem/array to refer to actual java arrays.

Arrays

Currently in coffi the ::mem/array type serializes anything seqable, which does include arrays, and it deserializes to a vec.
I think that to support the direction you're going here, we should add optional kwargs as options in the array type, with a :raw? true option meaning that it will deserialize to a JVM array and will assume that the argument is an array, and then add some conditionals in to ensure that we have the fast path for array serialization.

I also think that your compromise around using a record-like type with both map and array style accesses is appropriate and well-done. I might personally want to go the other direction with the foreach implementation though, making it act as if it's reducing over a sequence of map entries. Doing it this way allows adding a quick map val into the stack without too much performance overhead, and it avoids the need to figure out a zipmap with the keys and values separately. I don't have too strong an opinion on this one though as long as keys returns the keys in the same order as foreach yields the values.

serde registry

The serde registry as it stands with generate-serialize and generate-deserialize both look pretty good in terms of usage and follow about what I want them to do, but I want to note two things about them that I'm not sure how I feel about right now.

The first is just an observation and not a problem, that being that these functions all generate the equivalent of a serialize-into or deserialize-from call, which I think is appropriate, I'm just thinking about what this might mean in terms of naming though if the generate-x functions are going to become a part of the public api of coffi.

The second is that these macros as they stand are unhegenic macro helpers. I think it would be appropriate for the multimethod to take in the symbol which will be used to refer to the segment.

with-c-layout

For the with-c-layout problem, I think there's a couple things to be done. To start with, we can make the private version in coffi.mem use :coffi.layout/padding explicitly which doesn't require the namespace be loaded, which reduces it to just one padding key. Then for the rest, I'm a little undecided about it.

All the structs being passed over the C abi will most certainly use the with-c-layout layout, however the intention behind having the namespace in the first place was to allow easily serializing clojure maps into e.g. std140 or std430 from the GLSL spec, I just haven't gotten around to implementing those yet as I was wanting to get a defstruct macro and some codegen for an opengl bindgen library first.

Specifically though, if we remove the coffi.layout namespace and just assume everything is the c-layout, that will then mean there's no way for a user of the library to reach lower in the abstractions to implement a different layout for their usecase except to re-implement defstruct for their own layout.

@IGJoshua IGJoshua mentioned this pull request Oct 14, 2024
@rutenkolk
Copy link
Contributor Author

I've been hacking away at this more.

I really like the idea of the optional keyword. I don't know what the default for the raw option should be. I've named it raw-arrays? for now, but i can switch it to the original raw. For now I defaulted to raw-arrays? true, which i can obviously change.
In any case, both versions work, also if mixed, like one defstruct using raw arrays and another one not using it but referring to the first defstruct.

Regarding the generated type: I've ran into issues with more implementations of the IPersistentMap interface colliding with IPersistentVector than i anticipated (like keys and vals for example relying on very specific seq behavior). I've defaulted to complete map like behavior if needed and removed the IPersistentVector interface from the type. But i've also still implemented the vector methods, but named differently, exposed via a new interface coffi.mem.IStructImpl. There is a new StructVecWrap type which implements IPersistentVector instead of IPersistentMap and which can take such a IStructImpl and offer the vector API with very little overhead, calling the vector methods of IStructImpl. to convert a type constructed with defstruct to this efficient vector implementation, and back, there are the new functions as-vec and as-map. The performance is really good, even with the indirection being as quick or quicker than native vectors in pretty much all cases. Similar optimisations have been done to generating seqs of the map or vector version of the type, being faster than seqs on maps and vectors respectively. I'll follow up with some benchmarks

the serde registry is unchanged for now, i'll have to think about nicely making them hygenic. it would definitely make things more complicated.

i also don't have a solution for the with-c-layout problem. one thing that confused me a bit from your answer though, is that ::layout/padding seems to be intentionally used in coffi.layout. Since there are also implementations in coffi.mem for c-layout, serialize-into and deserialize-from for ::padding (::mem/padding) i was under the impression that ::layout/padding was a typo and ::mem/padding was the "correct" version of that keyword. since lots of code uses namespaced keywords i would strongly prefer deciding on one.

I still have to clean up a bit, and the above mentioned things aren't addressed yet, but functionality wise, i think things are mostly done and tested.

attached is a file with some random criterium runs that i've done to test the speed of stuff: results are promising. the new defstruct is faster than records, maps and sometimes vectors and seem to improve the speed of calling native functions with struct arguments / return types.

mini-assorted-benchmarks.txt

@IGJoshua
Copy link
Owner

I've been following along as you've been adding commits, and it's been really fun to watch as it grows. Thanks for the wrap-up, it's helped clear everything up in my head though, since I don't usually haven enough time to fully dig into and understand the diffs when I do that.

I think I'd like the :raw-arrays? to default to false just to comply with the default behavior elsewhere, since a goal of coffi is for calling native code to feel like calling Clojure code. I think the naming is good if you put the argument on the struct itself, I had originally been thinking about a [::mem/array ::whatever-type 3 :raw? true] type extension to say that the native array will deserialize to a raw java array, similar to how the [::mem/fn [] ::mem/void :raw-fn? true] type argument disables automatic serdes on the argument and return types of the function as it crosses native boundaries. If we were to do it that way, we could collapse the coffi types for vector-array and array down into a single type, and it would even allow structs to have some arrays which are native and some which are not, which allows for more possible usecases. I can add support for these raw arrays to the basic types so you only have to worry about how they interact with structs.

I really like the solution of adding as-vec and as-map as functions you can call on the result to convert between the types, and I'm really happy to hear about the performance of this! I looked at the mini benchmarks and they looked really good.

If making the generation functions hygenic would be a significant amount of work then I would be comfortable with delaying that work and ensuring that the generation functions aren't a part of the public api of coffi for now.

So the with-c-layout the two different padding keywords are for different things. ::mem/padding is a type which comes from the panama PaddingLayout which accounts for bytes without meaning just there to take up space. Then ::layout/padding is a field name assigned to all padding "fields" within a struct. It's not intended to be kept or used for anything, and multiple padding fields get destructively written to the same key on structs as they're deserialized as it stands, and in the case of a defstruct implementation I'd actually want to just not have the padding field appear in the result at all, even though padding layout elements would still need to appear in the layout of the struct itself.

I'd like to thank you again for taking on this feature! I really like the work you're doing. :)

@rutenkolk
Copy link
Contributor Author

rutenkolk commented Nov 1, 2024

Thanks for the feedback! I'm glad you like the progress so far.

I think I simply didn't fully get how you meant the :raw option to work, but I really like the [::mem/array ::whatever-type 3 :raw? true] solution over vector-array (which i basically bolted on to make things work without touching the code for ::mem/array, which i wasn't 100% happy with yet either).

I also think your rationale on raw or raw-arrays? to default to false is probably for the best. That does make sense in the context of the library. There are some practical concerns that i had myself once i tested things more extensively, most notably probably is that raw arrays don't play nicely with =.

Thank you for also clearing up that misunderstanding on the whole :padding situation! The reason i deal with the padding in the defstruct macro at all is to precompute where to (de)serialize the values are in the segment. I agree it shouldn't be emitted from the macro though.

also, i think i forgot to mention it, but i really like the idea of supporting different layouts, especially in the context of OpenGL / graphics development! so with all that information, the layout namespace makes a whole lot more sense to me!

i'll continue on the PR then and update when i hit what feels like the next milestone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants