Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust extensions based on Extension C API #370

Closed
samansmink opened this issue Aug 8, 2024 · 15 comments
Closed

Rust extensions based on Extension C API #370

samansmink opened this issue Aug 8, 2024 · 15 comments

Comments

@samansmink
Copy link
Collaborator

Hi 👋

I've been working on the new C API for extensions in duckdb/duckdb (see duckdb/duckdb#12682). TLDR: we've added a header to duckdb which contains basically a struct of function pointers with most of the C API. This struct is then passed to a loadable extension on load. This is basically the same mechanism that sqlite has.

Goals

One of the goals of this API is to be able to write clean pure-rust extensions for DuckDB that can be used across multiple duckdb versions. Basically I want an extension template like https://github.com/duckdb/extension-template, but than in pure-rust based on this new C API.

Now that that PR is merged, I would like to get started on this.

Approach

Good for me, rusqlite, the repo this one is based on, is way ahead of us and already has support for sqlite's variant of the extension api struct since rusqlite/rusqlite#1362.

So in theory, all we need to do is copy (the parts we like) of rusqlite for this to generate the loadable extensions. Then tie everything together for the

The catch

The main catch is my Rust knowledge is pretty minimal. Therefore this will also be a bit of a rust-learning-experience for me. However, given the fact that it's mostly a copy-and-paste exercise from rusqlite, this should be fine? ;)

finally, any help or advice is very welcome!

@0xcaff
Copy link
Contributor

0xcaff commented Aug 14, 2024

hey hey! glad to see there's some progress on this. I'd love to get this landed. Are you looking for someone to pair with on the rust bits? Are you looking for someone to do the rust side of the implementation? I'm happy to do either one or even hop on a call and figure out the best path forward (I can allocate 1 day of time this week towards getting this landed). What would be most useful for you?

Broadly it seems like the big parts are

  • Codgen nice rust accessor functions for our version of *mut ffi::sqlite3_api_routines (looks like bindgen + some custom codegen on top)
  • extension init
  • bridging stuff between existing api and new api (they might be the same fns?)

I'm in US Central and available during daytime hours (9am to 9pm), let me know when a good time is for you over the next few days.

This was referenced Aug 14, 2024
@samansmink
Copy link
Collaborator Author

Hey @0xcaff, That's great to hear! Lets setup a quick call I would love to get your take on this. I will drop you an email.

I have also fiddled with it and I have a (very messy) PoC working here: https://github.com/samansmink/duckdb-rs/tree/poc-rust-c-extension-api. I think it looks pretty similar to your direction, but I also added the attribute macro for the entrypoint, handled the version properly and added a demo function to the examples which uses the new entrypoint. I actually managed to load the extension from DuckDB too! :)

So basically the end-to-end is there, now its a matter of thinking about how to get things integrated cleanly. One of the things that I find a little awkard about the way things are setup in my feature branch now is that the libduckdbsys package is now basically fundamentally different based on this loadable_extension feature, making it not really a feature but almost transforming it completely into a different package.

@0xcaff
Copy link
Contributor

0xcaff commented Aug 15, 2024

Nice! It looks like my changes are pretty similar to yours. I think the open questions from me are

  • Versioning. It seems like you allow specifying a min-version in your proc macro but it seems unused.
  • Bundled build. Neither of our implementations support using a bundled version of the bindings.
  • Deprecated functions. Why are deprecated functions which are still exposed in the C ABI (?) not exposed in the new API?
  • Why does rustqlite use a bunch of atomic values instead of a single one for the entire API struct?

As for replacing all implementations of functions in the package, I don't think its a big deal as if you want both versions to co-exist you could import the crate twice with different features and also because all the stuff in the -sys crate is only wrappers around the C ABI. We can talk more about this on Friday.

@0xcaff
Copy link
Contributor

0xcaff commented Aug 16, 2024

Notes from our call

  • The new C API allows for specifying a version here. https://github.com/samansmink/duckdb-rs/blob/eeca07d8f7ba6c6b9a0a10f4c92c3f87e319487f/crates/libduckdb-sys/build.rs#L426 Extensions declare a minimum supported duckdb API version and duckdb returns a compatible API struct.

  • Bundled build. This needs to be figured out.

  • Deprecated functions. Sam plans to consider adding them back as it looks like the deprecated functions are used pretty deeply in the rust bindings currently and seems like its not trivial to update them or flag them out.

  • Atomics. Concluded fine either way (atomics or using mut static) as we couldn't come up with usages where the atomics protect you from anything during the happy path (considered duckdb plugin init being called multiple times (it is not), calling code pre-entrypoint (you're on your own in this case)). My vote is for the mut static as this is an implementation detail. I think the rustsqlite ppl were trying to avoid unsafe even in their implementation details (which imo is not worth the runtime overhead).

Thanks for taking the time to bring me up to speed Sam!

I'll take on getting the bundled build working.

@parkerdgabel
Copy link

Hi! I'm looking to start an extension in rust using the c API. I'm available to help with anything to move this along!

@0xcaff
Copy link
Contributor

0xcaff commented Aug 21, 2024

@samansmink I made a PR into your repo with bundled build working. Tagging it here in case it got lost samansmink#1

Can't wait to get this all landed when you're back from holiday!

@samansmink
Copy link
Collaborator Author

that's great, @parkerdgabel! I will ping you for a review once the PR lands, some more eyes on this is definitely helpful

@0xcaff
Copy link
Contributor

0xcaff commented Aug 27, 2024

Saw you merged the work in progress PR into your branch! Let me know if you are finding issues with it, happy to help.

@parkerdgabel
Copy link

Great @samansmink ! I have the beginning of my extension here if you would like a repo to test on. I'm not totally sure how to compile for consumption by duckdb. I was working off your poc branch. I’d love to help get this extension template working

@0xcaff
Copy link
Contributor

0xcaff commented Sep 5, 2024

I've setup a usage of the c extensions API in my project (vendored duckdb-rs) and I found a few things we missed.

  1. https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/libduckdb-sys%2B1.0.0.patch#L102-L114 If buildtime_bindgen is disabled, and loadable_extension, we gotta switch here. We missed this in both our POCs. Only discovered when disabling bundling.
  2. I added handling for the minimum_duckdb_version value here https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L53-L58 https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L78C71-L78C101

There's still a bunch of rough edges.

  • There are so many features I'm not even sure what they all are. Bundled, buildtime bindgen, loadable. I think there's 2^3 valid feature sets now. I think there's room to compress them now somehow?
  • The builtin bindgen headers are a pain to update (used when buildtime bindgen is off). I turned off tests and generated them manually. It seems like there are some failing when loadable is enabled.

What happens to old extensions following the ABI of the new extensions? It seems like they are all going to break? This seems suboptimal, is this really what you want?

D LOAD '/Users/martin/projects/duckdb_protobuf/target/release/protobuf.duckdb_extension';
Not implemented Error: Enum value: '' not implemented

It is because of the code here trying to check the abi

https://github.com/duckdb/duckdb/blob/0bc27aba2cfd2fb3dd3637b4e4d28f2d6c34db74/src/main/extension/extension_load.cpp#L197

@0xcaff
Copy link
Contributor

0xcaff commented Sep 5, 2024

here's what i did to get stuff working in my project btw 0xcaff/duckdb_protobuf#17

@carlopi
Copy link

carlopi commented Sep 13, 2024

A bit lateral, but the mentioned problem where error message would be cryptic on extension mismatch have been improved in the main of duckdb, to become v1.1.1 by duckdb/duckdb#13894.

@parkerdgabel
Copy link

Btw I forked the project and started my extension. It is working for me. I also added Scalar function support to the library. Here is my extension using the fork
https://github.com/parkerdgabel/quackML
Here is a short video of logistic regression now running in pure rust through duckdb.
https://github.com/user-attachments/assets/ccc1e33a-0ae6-4c0b-9d84-38913754991d

I've continued to work on the fork in the meantime but can still help with this.

As an aside, do any of you know how I can write a Vec to a flatbuffer of list type?

@samansmink
Copy link
Collaborator Author

(experimental) Template is now up here: https://github.com/duckdb/extension-template-rs

With that, this first issue can probably be closed. Please feel free to open new issues for further improvements!

@matthewgapp
Copy link

Somewhat related. I created a PR to support scalar UDFs based on patterns from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants