-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C API extensions #12682
C API extensions #12682
Conversation
This looks fine from the perspective of someone experienced with the extension API. It won’t significantly affect API consumers from the Rust side. I appreciate that this PR allows the extension API to evolve. I suggest adding the ability for DuckDB to support older extension API versions by passing multiple structures upon extension initialization. In the init function, can you extend it so DuckDB can send a list of supported extension API structures, and the extension can choose which version to use? For example, Currently, we lack many features of the C++ extension API in the C API. Some examples are:
Closing these gaps by making the extension API easier to extend would be beneficial. |
The reason tools like I also think we should limit convenience wrappers in the C API and focus primarily on feature parity/performance. When the C API is sufficiently extended we can design wrappers in other languages that build on top of the C API (e.g. Rust/Go, maybe C++ itself) that build on top of this API and provide a better usability experience for writing extensions. Ideally few people would actually need to interact with the C API outside of the developers maintaining the wrappers at that point, making the convenience wrappers at that layer moot. |
I like the idea of maybe having multiple structs be returned as we could compartmentalize parts of the API. e.g. scalar and table functions could have their own API structs that can be evolved independently. Although I guess we would need a common "core" api for types and values anyway so maybe this just causes a bunch of unnecessary complexity. However, I wonder if it's not better to start from scratch with the extension API instead of having it be derived from the current C-API...? As this is a new interface we don't have to worry about breaking ABI, we can remove deprecated apis and generally rethink stuff (like e.g. error handling) with the benefit of hindsight and with a (perhaps) clearer goal of what functionality extensions need. We can also omit other client-related stuff like result collection etc. I know Im not alone in being annoyed at some of the inconsistency that has accrued. I know this might sound daunting but I think we could come up with a draft if we just schedule an afternoon of mob-programming in the office. |
So for this demo I just added every function that is not deprecated to the v0 api struct. I think its a fair point that this would be and execellent point to reconsider if there are parts of the C CAPI we are not happy with and simply not include them in the struct.
That's an interesting idea, but it would indeed make things a bit more complex. Note also that even though we can then more easily change the apis, we can not make a lot of changes anyway, because we need to support both the before and after of every change we make, so we shouldn't be making lots of changes anyway. All in all I'm not super convinced that splitting the structs is worth the effort but let's discuss tomorrow! @rustyconover |
On Load, I would try hard to re-design the init function so that it takes parameters that are compatible between C++ and C API. I think it's better to have a single uniform init function that try to find a given function. Possibly for later, unsure what's the ideas here:
|
Hmm technically c++ extensions can already use the C API init function I wrote here, they can just fetch the DatabaseInstance from the connection directly. Maybe we can indeed unify this right away.
i think i prefer letting duckdb handle the compatibility checking, but i agree that the hard dependency on having the metadata present is not ideal. I would say we have these options:
I went with #3, but i think either of these options could work
we could and probably should have this. Adding it shouldn't be too hard though
good point, i've left this to a future PR. it's implementation-wise completely separate from this PR, i think its just a matter of where to look for extensions for downloading and where to store/load them from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks great - some comments from my side
extension/demo_capi/capi_demo.cpp
Outdated
//===--------------------------------------------------------------------===// | ||
extern "C" { | ||
|
||
DUCKDB_EXTENSION_API void demo_capi_init_capi(duckdb_connection &db, duckdb_ext_api_v0 *api) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a duckdb_database
, and shouldn't be a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do something like this instead to keep things extensible (ha);
void demo_capi_init_capi(duckdb_extension_info *info, duckdb_extension_access *access) {
auto db = access->get_database(info);
auto api = access->get_api(info, DUCKDB_API_VERSION);
if (!api) {
return;
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be wrapped into DUCKDB_EXTENSION_LOAD_API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duckdb_extension_info info(db);
duckdb_extension_access access_methods;
ext.Load(&info, &access_methods);
if (info.LoadedUnsupportedVersion()) {
throw ...
}
src/include/duckdb.h
Outdated
duckdb_result_type (*duckdb_result_return_type)(duckdb_result result); | ||
bool (*duckdb_value_boolean)(duckdb_result *result, idx_t col, idx_t row); | ||
int8_t (*duckdb_value_int8)(duckdb_result *result, idx_t col, idx_t row); | ||
int16_t (*duckdb_value_int16)(duckdb_result *result, idx_t col, idx_t row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove all deprecated methods from the struct?
src/include/duckdb.h
Outdated
@@ -3230,6 +3239,610 @@ It is not known beforehand how many chunks will be returned by this result. | |||
*/ | |||
DUCKDB_API duckdb_data_chunk duckdb_fetch_chunk(duckdb_result result); | |||
|
|||
typedef struct { | |||
// Version v0.0.1 | |||
duckdb_state (*duckdb_open)(const char *path, duckdb_database *out_database); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be in duckdb.h
I've processed the comments here. To summarize the changes:
|
Thanks! |
Merge pull request duckdb/duckdb#12682 from samansmink/c-api-extensions Merge pull request duckdb/duckdb#13296 from carlopi/fix_linux_release_checkout Merge pull request duckdb/duckdb#13292 from Mytherin/reducetempdirsize
I wonder if we could get a follow up of this PR that would cover
|
Hey @jangorecki yea there will be follow ups here! Both documentation and testing are lacking still. As things materialize here, I'm also planning to write a blogpost to describe the whole thing in more detail. |
Hey! Just curious, are there any plans to add the filters to the C API? I have a file format (implemented in Rust) that I'd like to write an extension for reading. The format has statistics/zone-maps so it could support predicate pushdown, but AFAICT that's currently not exposed as part of the table function C API. |
Yes, that is definitely on the roadmap |
I'm trying this out in an SQLite-like way, but I'm hitting a few roadblocks. Here is the closes I've managed (see https://github.com/Florents-Tselai/mycext/actions/runs/10849069915/job/30107489251#step:6:5)
I'm also building 1.1.0 from the source and using Makefile. On my Mac though, things look a bit better in that the init function is indeed called,
Am I missing something obvious here? (i.e. have to pass extra flags when building from src) |
@Florents-Tselai did you follow the example extension? https://github.com/duckdb/duckdb/tree/main/extension/demo_capi |
Hi. The |
@ddevienne I guess it should be added there indeed. For now you can just grab the header from https://github.com/duckdb/duckdb/blob/v1.1.1/src/include/duckdb_extension.h pr: #14086 |
How can I determine if a value is null in the C API? It seems there isn't a method for this currently. If I call duckdb_get_xxx methods directly on a null value, it crashes. |
Initial experimental(!) implementation of loadable extensions with C extension API See duckdb/duckdb#12682 for more info on the DuckDB C extension API --------- Co-authored-by: martin <[email protected]>
This (draft) PR stabilizes (part of) the Extension C API. It achieves this by bumping the version in `duckdb_extension.h` to `v1.2.0`(the upcoming DuckDB release) through the header generation mechanism. Additionally, this PR solves a few small issues encountered along the way. ## Notes to reviewers In the current draft state, i'm particularly looking for feedback on the following: We need to thoroughly review which functions are going to end up in the `v1.2.0` stable part and which functions go in the unstable part. Basically this PR is proposing to add all non-deprecated functions that pre-existed to the stable part of the API. The only functions that go into unstable are the deprecated ones and the ones that were very recently added. Then any feedback/ideas/potential issues with: - lifecycle of C API functions - testing mechanism using reference extension Only when this has been reviewed and approved a thorough code review is required ## Background The C extension API is defined by the regular `duckdb.h` header combined with the `duckdb_extension.h` header. In `duckdb_extension.h` we define a struct of function pointers that is passed to an extension on load. Through this struct of function pointers, the extension will be able to call into DuckDB functionality without the dynamic loader needing to lookup the symbols on extension load. For more info, see #12682 ## This PR Before this PR, the version of the Extension C API was separate from the DuckDB one, being set to an arbitrary v0.0.1. With this PR, we bump this version to `v1.2.0`, which will be the next version of DuckDB. As part of this PR, we also slightly change the way versioning of the Extension C API works. ## Extension ABI Types In this PR, extensions can be of 3 different ABI types. Firstly, `ExtensionABIType::CPP` type extensions are coded against C++ API. They are compiled for a single target version of DuckDB and will only work with that version. Most of the currently available DuckDB extensions are of this type. Secondly, we have the `ExtensionABIType::C_STRUCT` type extensions. These are currently produced by the C api based extension templates. These extensions use the C API by requesting the function pointer struct from `duckdb_extension.h`. These extensions specify a **minimum DuckDB version** in their metadata and because the function pointer struct is never modified but only appended to, these binaries are (API-)compatible with all duckdb release after its specified minimum version. (or till the point we decide to "restart" the struct, see step 4 of the lifecycle) Finally, we have a the newly added `ExtensionABIType::C_STRUCT_UNSTABLE` type. Extensions of this type fall sort of in the middle of the first two: while they do use the C API, they rely on the unstable part of the function pointer struct. This means that the extension binaries of this type are actually strictly tied to a specific DuckDB version. This allows building extensions on functionality that we can not yet guarantee to be stable. ## Life cycle of a C API function To understand the process of adding functionality to Extension C API, we will consider the lifecycle of a new C API function. Let's say we want to add a new function that looks like: ```c++ uint64_t duckdb_add_two_numbers(uint64_t a, uint64_t a); ``` ### Step 1: adding the function to the C API To add `duckdb_add_two_numbers` to the C API, we first need to add its description to a (new or existing) json definition file in `src/include/duckdb/main/capi/header_generation/functions`. Next up we need to ensure that duckdb actually implements the `duckdb_add_two_numbers`, by adding its implementation, for example in `src/main/capi/duckdb-c.cpp`. Next, we add the `duckdb_add_two_numbers` function to the exclusion list in `src/include/duckdb/main/capi/header_generation/apis/v1/exclusion_list.json`, because we do not yet want to expose it to the function pointer struct. Finally, we run the `scripts/generate_c_api.py` code generation script to complete Step 1. With step 1 completed, DuckDB now has the `duckdb_add_two_numbers` exposed for linking, but C API extensions are not yet able to use it: because it's not in the function pointer struct. This could be a sensible first step for functionality that is added in a PR to DuckDB that may require several more PRs to be practically usable. ### Step 2: adding the function to the unstable C API struct Let's say we now want to use the example `duckdb_add_two_numbers` function from an extension, but we are not yet ready to promise stability of the function. This is achieved by moving the function from the exclusion list to the unstable api struct defintion in `src/include/duckdb/main/capi/header_generation/apis/v1/unstable/*.json`. After moving and rerunning the codegen, our (simplified) function pointer struct could look like: ```c++ typedef struct { // v1.2.0 duckdb_state (*duckdb_open)(const char *path, duckdb_database *out_database); // .... // These functions have been recently added to DuckDB. They are candidate functions to be added to the stable API #ifdef DUCKDB_EXTENSION_API_VERSION_UNSTABLE uint64_t (*duckdb_add_two_numbers)(uint64_t a, uint64_t a); #endif } duckdb_ext_api_v1; ``` Note that the function was added all the way at the end, and is hidden behind the `DUCKDB_EXTENSION_API_VERSION_UNSTABLE` define. This means that `duckdb_add_two_numbers` can only be used by extensions of the `ExtensionABIType::C_STRUCT_UNSTABLE` type. Also, it's important to understand that `C_STRUCT_UNSTABLE` extensions themselves are not necessarily "unstable" in the sense that there are not production quality: the only thing unstable means in this context is that it uses the unstable part of the function pointer struct, meaning that recompilation is required for every DuckDB version. Furthermore, **functions are alllowed to remain in the unstable part of the function pointer struct in stable releases of DuckDB**. Finally, functions in the unstable part of the struct can be removed from the struct in future versions without any problems. ### Step 3: stabilizing a function Now we are finally ready to stabilize `duckdb_add_two_numbers`. To do so we move the function from `src/include/duckdb/main/capi/header_generation/apis/v1/unstable/*.json` to a newly created `src/include/duckdb/main/capi/header_generation/apis/v1/v1.x/v1.x.y.json` file, where `1.x.y` is the upcoming release. Rerunning the codegen will produce a function pointer struct like: ```c++ typedef struct { // v1.2.0 duckdb_state (*duckdb_open)(const char *path, duckdb_database *out_database); // .... // v1.x.y uint64_t (*duckdb_add_two_numbers)(uint64_t a, uint64_t a); } duckdb_ext_api_v1; ``` After the code moving `duckdb_add_two_numbers` to stable is in duckdb/duckdb, it is no longer allowed to change it, for this would break the forwards compatibility of the binaries targetting `v1.x.y` and up. This means that merging PRs that add to the stable part of the function pointer struct should always be review with utmost care. When in doubt, functionality should remain in the unstable part of the function pointer struct. Interestingly, in the run-up to duckdb release `v1.x.y`, it **is** allowed to add to `src/include/duckdb/main/capi/header_generation/apis/v1/v1.x/v1.x.y.json` in multiple PRs, as long as they are appended to the end. Also, this means that as soon as functions are added to the *unreleased* `v1.x.y` part of the struct, `ExtensionABIType::C_STRUCT` extensions relying on that function can already be compiled by targeting `v1.x.y` as minimum duckdb version. ### Step 4: removal/modification of a function? Removal (or changing the signature) of a function that is part of the stable API is (by design) discouraged and not really possible. However if it turns out to be absolutely necessary to stop support for one of the functions in the stable function pointer struct, we have several options. The first option is to replace the function with a `NOP` operation. For example a function that returns a `duckdb_state` can simply always error. Extensions are expected to check the `duckdb_state` result from all c api calls, so they should gracefully handle the error, which will then partially or fully break functionality of the extension. Note that this solution is fairly fragile as it could break extensions relying on the function in unexpected ways without a good way to detect this. The second options is to do a complete regen of the function pointer struct. This will require DuckDB to do an additional check where it refuses to load `ExtensionABIType::C_STRUCT` type extensions with a minimum version from before the struct was regenerated. Note that the situation where DuckDB may need to refuse loading extensions with a too low minimum version might also be necessary for example if compatibility between extensions of certain versions and duckdb arise due to toolchain problems A third options is to add a new extension type such as `ExtensionABIType::C_STRUCT_V2` where a completely new struct is provided. This will allow supporting both the old and the new behaviour. To conclude, while ideally dropping support for any of the stable functions is not super pleasant, we do have options if the burden of supporting functions becomes too large. ## Testing This PR updates the `demo_capi` extension so that it can produce both a `C_STRUCT` and a `C_STRUCT_UNSTABLE` extension. This was used to validate the code in this PR. However, proper testing is something that will be done in a follow up PR. The idea is that we are adding a so-called "reference extension" which now lives in `https://github.com/duckdb/reference-extension-c`. This extension will be able to produce extension of both the stable and the unstable api, with branches/tagged releases that target every DuckDB release. DuckDB's CI will then be able build/download these reference extensions and run the test suite to confirm that the whole extension API is working as expected and the API-compatibility is guaranteed. ## TODOs after merge - update c-api/rust extension templates to support the unstable api builds - update the reference extension to support the unstable api builds - add CI to duckdb/duckdb that builds the reference extensions and runs its tests.
This PR:
duckdb.h
to use code-generationWhy?
The C Extension API solves 3 major problems with extensions:
Linking DuckDB into every extension
Currently every extension has a (part of) DuckDB linked into it statically. When an extension's init function is called, a pointer to the DuckDB instance is passed to the extension code. However, to interact with
this pointer to the instance in any meaningful way, requires to be able to call into DuckDB code. There are two ways this can be achieved when extending the C++ API: Firstly, by relying on the RTLD_GLOBAL flag. This flag
can be set when dlopening binaries to ensure the dlopened extension can resolve its missing symbols against those in the parent process, i.e. DuckDB. Unfortunately we can't always control whether DuckDB itself is opened with RTLD global: this would require us to hack this into every client (because the default on MacOS is RTLD_LOCAL). The second way is what we currently do: statically link a copy of DuckDB into every extension. While this is super wasteful with regards to the binary size, it turns out to be quite effective in distributing extensions to many platforms.
Now using the extension C API implementation described here, this problem would be circumvented completely: the extension can call into duckdb using only function pointers: no linking required there! This makes the extensions really small! for example the extension in my demo branch compiles to a 17K binary.
Rebuilding extensions for new DuckDB versions
C++ API Extension are now tied tightly to the DuckDB version: Because we make no guarantees about C++ API stability and also because of how we link DuckDB into every extension binary, all extensions need to be recompiled for new versions of DuckDB. With C api extensions, this is no longer the case. We can simply build the extension once, and as long as DuckDB supports that version of the C API, the extension binary will be loadable.
Requiring C++ for writing your extensions
Cleanly writing extensions in C, Rust, Go will be possible building on top of the C API. By using the "passing a c struct" method from SQLite, we remove the need to link with duckdb during build time whatsoever, making things much simpler, allowing extension to be created purely using the extension header file.
Implementation
Overview
The basic idea of this is that instead of relying on dynamic or static linking to call into DuckDB functions, the extension init function gets passed a struct containing all available functions. This would
look a little like:
The code for a C Extension API based extension is very straightforward, due to some macros:
Checkout the capi_demo extension added in this PR for an example.
Code generation
I've split up the C API across two files:
duckdb.h
is the original capi header fileduckdb_extension.h
is the header that extensions need to include, it contains the struct-of-function-pointers and typedefs to the original extension names.The code is generated from
json
files which are insrc/include/duckdb/main/capi/header_generation/
. There are 2 types of json files. function definition files define all possible functions that are in the C API. These can potentially be deprecated. api definition files contain the definition (and order) of the C API struct.