-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start implementing FFI #91
start implementing FFI #91
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to:
- Decide on a strategy for error handling
- Wrap more of the Rust API in C-friendly structs. (This will be a lot of code, and it's why I wanted the FFI bindings to be a separate crate.)
ffi/src/lib.rs
Outdated
// TODO: We REALLY don't want to expose the real type of default client. Doing so forces every kernel | ||
// API call to take either a default client or a generic client, because those are incompatible types. | ||
use deltakernel::client::executor::tokio::TokioBackgroundExecutor; | ||
type KernelDefaultTableClient = deltakernel::DefaultTableClient<TokioBackgroundExecutor>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we should just wrap this as a generic client.
ffi/kernel.h
Outdated
typedef struct EngineIterator { | ||
void *data; | ||
/** | ||
* A function that should advance the iterator and return the next time from the data | ||
* If the iterator is complete, it should return null. It should be safe to | ||
* call `get_next()` multiple times if it is null. | ||
*/ | ||
const void *(*get_next)(void *data); | ||
} EngineIterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this was introduced earlier, but it looks like engine iterator is missing a release
/free
method to cleanup any allocations it may have made.
ffi/src/lib.rs
Outdated
#[no_mangle] | ||
pub unsafe extern "C" fn get_default_client( | ||
path: *const c_char, | ||
) -> *const KernelDefaultTableClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few points:
- Error handling: From what I've seen, usually C APIs will return an error code to indicate whether the operation was successful. That seems to be how FUSE handles errors. In the ADBC C API, we return an error code and also have a pointer to an error struct that allows filling in more information like an error message (example function signature).
- Allocating types: This function shoves the table client into an
Arc
, and then returns it as a raw pointer, giving the caller no way to release the data when it's done with it. That seems to be an inevitable memory leak. Instead, we should probably give the C API a struct it can allocate and we can initialize:
#[repr(C)]
struct TableClient {
data: *mut c_void,
get_file_system_client:
extern "C" fn(data: *const TableClient, out: *mut FileSystemClient) -> StatusCode,
...
/// Tell the table client to release the resources
release: extern "C" fn(self: *mut TableClient) -> (),
}
#[repr(C)]
struct FileSystemClient { ... }
The FFI TableClient
can then easily implement the trait kernel::TableClient
by calling the function pointers. For example:
impl kernel::TableClient for TableClient {
fn get_file_system_client(&self) -> Result<Arc<dyn kernel::FileSystemClient>> {
// Initialize an empty FFI filesystemclient
let mut fs_client = FileSystemClient::empty();
let status = self.get_file_system_client(&self, &mut fs_client);
match status {
Status::Ok => Ok(Arc::new(fs_client)),
Status::Err => Err(Error),
}
}
}
Then for us to provide the default client, we just need to wrap KernelDefaultTableClient
in the FFI wrapper:
#[no_mangle]
pub unsafe extern "C" fn get_default_client(
path: *const c_char,
out: *mut TableClient
) -> StatusCode {
let path = unsafe { unwrap_and_parse_path_as_url(path) };
let Some(url) = path else {
return std::ptr::null_mut();
};
let table_client = KernelDefaultTableClient::try_new(
&url,
HashMap::<String, String>::new(),
Arc::new(TokioBackgroundExecutor::new()),
);
out.get_file_system_client = default_client_get_fs_client;
out.release = default_client_release;
out.data = Arc::into_raw(Arc::new(table_client))
}
unsafe extern "C" fn default_client_get_fs_client(
table_client: *mut TableClient,
out: *mut FileSystemClient
) -> StatusCode {
todo!("wrap the filesystem in the FFI FileSystemClient and copy to provided pointer")
}
unsafe extern "C" fn default_client_release(
table_client: *mut TableClient,
) -> StatusCode {
// Converting data back to Arc so ref count will be decremented when dropped
// at end of this function.
let _client = unsafe { Arc::from_raw(table_client.data) }
table_client.data = std::ptr::null;
// Make release callback null to make this struct as having been released.
table_client.release = std::ptr::null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the basic idea I settled on as well... but it's easier said than done:
- If the engine wants to provide a table client, they need a way to instantiate an
Arc<dyn TableClient>
. The vtable emulation stuff that safer-ffi came up with seems like the right "flavor" of approach... except that it seems to only handle exporting a trait to C++, and does not provide any obvious way for C++ code to implement traits. - Even if we do manually code up the vtable for
TableClient
as you suggest, there's still the problem thatTableClient::get_expression_handler
returnsArc<dyn Expression>
... same problem all over again - ... and
Expression::get_evaluator
returnsArc<dyn ExpressionEvaluator>
... so again - ... and
ExpressionEvaluator::evaluate
method returnsDeltaResult<RecordBatch>
, and there's also no way to express a rust enum in C/C++. For example,safer-ffi
doesn't even try to support non-trivial rust enums right now: https://getditto.github.io/safer_ffi/derive-reprc/enum.html#more-complex-enums ("Are not supported yet.") - ... and even if we figure out the
DeltaResult
part, thatRecordBatch
is an arrow API class that will soon enough change toArc<dyn DataBatch>
or similar.
(and that's just one path through the tree of structs and methods)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt. all the various different "handlers" or "clients" we need to return, I think in the end it will be on the order of 10-20 function pointer assigners we'll have to implement. That doesn't seem to bad all together. (fuse has about 45). I think having to have a get_file_system_client
, and a get_expression_handler
, and all the other get_
s isn't unreasonable.
For the result stuff, I think we are stuck with the "old" way of doing things in C, which is to say, null
means "error" and otherwise it's a pointer to valid data.
It would be nice if we could still return a StatusCode, with the convention that we will only have filled in an out
pointer if the return code is valid, but I'm not sure we can create a null pointer to a trait object to fill in, because *mut Trait
is a fat pointer.
So we'll need to have a slightly different signature in the FFI. So an EngineEvaluator
would maybe look like:
#[repr(C)]
struct FFIEvaluator {
internal: *mut c_void,
evaluate: extern "C" fn(internal: *mut c_void, data: *const EngineData) -> *const EngineData,
}
impl kernel::ExpressionEvaluator for FFIEvaluator {
fn evaluate(&self, batch: &dyn EngineData) -> DeltaResult<Box<dyn EngineData>> {
let batch: *const EngineData = batch;
let result = self.evaulate(self.internal, batch);
if result.is_null() {
Err(Error)
} else {
unsafe { Box::from_raw(result) }
}
}
}
We could potentially improve the error messages by allowing engines to set errno and using https://doc.rust-lang.org/std/io/struct.Error.html#method.last_os_error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks about like what I was imagining as well. The main things on my mind:
- It seemed worthwhile to wait for your EngineData PR to land, because it cleans up a lot of messes that make FFI harder (and also would churn any FFI I did try to write). I see a big block of empty time on my calendar today I hope I can use to finally review it.
- I think I figured out a way to define error handling at (FFI) engine interface level. Basically, the FFI engine interface defines a method for kernel to allocate an exception object, which kernel guarantees to immediately return to the engine (no other intervening code that could fail and cause the error to leak). That would allow an idiom similar to rust
Result
. An engine that supports C++17 (**) could even use `std::variant<T, EngineError*>. But prototyping that needs to wait until we figure out the FFI engine interface in general. - How can we define our FFI engine interface API in such a way that engine actually has the choice to request a default client from kernel or use its own? Ideally, engine even be able to mix and match (e.g. use native parquet+json readers, but leverage defaults for expression eval etc). This seems tricky, because we don't want to add so many layers of indirection that would come from "just" doing a round trip through an engine vtable format. Especially when it comes to error handling, where converting from kernel error to engine errors will necessarily be lossy (because it seems unwise to pass arbitrary rust exceptions across the FFI boundary).
In many ways, 3/ is my biggest concern, to be honest.
(**) Sadly, duckdb is only C++11 and so couldn't use std::variant
-- which is kind of ironic because they define a SQL UNION type that is specifically described as
similar to C++17’s
std::variant
, Rust’senum
or the “sum type” present in most functional languages
|
||
fn visit_expression_binary( | ||
state: &mut KernelExpressionVisitorState, | ||
op: BinaryOperator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make BinaryOperator
be #[repr(C)]
(or something that we into()
a BinaryOperator
)? Then we could expose this function directly and not have to have all the visit_expression_X
ones below.
This would work for the ones were all we do is visit_expression_binary(state, BinaryOperator::X, a, b)
. I know some do other things and so would need to either remain as is or have more logic in here.
Would cut down on code and api surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my original inclination as well, but:
- If we require engine to pass a generic binary operator, we replace N method definitions with one method definition plus N enum variant definitions.
- Even simple enums are unsafe across the FFI boundary, because Rust assumes they never contain "out of band" values, but C/C++ could send any arbitrary value that fits in the enum's backing integral type. We could "solve" that by validating the raw enum type before using it, but that's extra code and extra failure cases that both sides now have to worry about (plus it would require us to add error handling capabilities that don't currently exist).
Taken together, it seemed strictly worse than the function-based approach I ended up pursuing here.
Granted, we'll eventually need to worry about error handling, because engine could pass invalid child ids (not tracked by inflight_expressions
), but that error path seems a lot less likely to reach prod because engine isn't interpreting those values. It just needs to not lose track of them.
ffi/src/lib.rs
Outdated
#[no_mangle] | ||
pub unsafe extern "C" fn get_default_client( | ||
path: *const c_char, | ||
) -> *const KernelDefaultTableClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrt. all the various different "handlers" or "clients" we need to return, I think in the end it will be on the order of 10-20 function pointer assigners we'll have to implement. That doesn't seem to bad all together. (fuse has about 45). I think having to have a get_file_system_client
, and a get_expression_handler
, and all the other get_
s isn't unreasonable.
For the result stuff, I think we are stuck with the "old" way of doing things in C, which is to say, null
means "error" and otherwise it's a pointer to valid data.
It would be nice if we could still return a StatusCode, with the convention that we will only have filled in an out
pointer if the return code is valid, but I'm not sure we can create a null pointer to a trait object to fill in, because *mut Trait
is a fat pointer.
So we'll need to have a slightly different signature in the FFI. So an EngineEvaluator
would maybe look like:
#[repr(C)]
struct FFIEvaluator {
internal: *mut c_void,
evaluate: extern "C" fn(internal: *mut c_void, data: *const EngineData) -> *const EngineData,
}
impl kernel::ExpressionEvaluator for FFIEvaluator {
fn evaluate(&self, batch: &dyn EngineData) -> DeltaResult<Box<dyn EngineData>> {
let batch: *const EngineData = batch;
let result = self.evaulate(self.internal, batch);
if result.is_null() {
Err(Error)
} else {
unsafe { Box::from_raw(result) }
}
}
}
We could potentially improve the error messages by allowing engines to set errno and using https://doc.rust-lang.org/std/io/struct.Error.html#method.last_os_error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this has some great ergonomic tricks for the ffi.
leaving the comments I have so far, and will continue to review.
ffi/src/lib.rs
Outdated
/// | ||
/// The slice must be a valid (non-null) pointer, and must point to the indicated number of | ||
/// valid utf8 bytes. | ||
unsafe fn from_slice(slice: KernelStringSlice) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have a preference for this to be more "try" oriented. If it returned a DeltaResult<String>
you could use from_utf8
rather than from_utf8_unchecked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally using from_utf8
, but there are already so many other things that can go wrong in this unsafe code it seemed almost trite to return an ExternResult
, which greatly complicates call chains (**), just for this one detail?
(**) Creating an ExternResult
requires access to an AllocateError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, there are only three call sites for that method, only two of which change engine code, so I went ahead and updated them. LMK what you think?
ffi/src/handle.rs
Outdated
} | ||
} | ||
|
||
mod uncreate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer "uncreatable", since "Uncreate" sounds like something that would do the action of uncreating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with "unconstructable" since that's the most accurate term for something that cannot be constructed?
ffi/src/handle.rs
Outdated
// | ||
// Unlike the default (unsized) implementation, which must wrap the Arc in a Box in order to obtain | ||
// a thin pointer, the sized implementation can directly return a pointer to the Arc's | ||
// underlying. This blanket implementation applies automatically to all types that implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "the Arc's underlying [object/struct/type]"
Err(*mut EngineError), | ||
} | ||
|
||
type AllocateErrorFn = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand it, the flow in an error case is something like:
- engine calls kernel function
- kernel encounters error
- kernel calls this allocation function, importantly with a string slice that is only guaranteed to be valid until the allocate function returns
- the engine can build up whatever error struct it wants, as long as the first field is a
KernelError
- kernel gets that back, and then returns it back to the engine
- Now the engine has its own error type that it allocated that it can do whatever it wants with
This makes sense with wanting custom error messages. I guess we just need to decide that that's valuable enough to warrant all this machinery. The alternative being just errno
style errors and:
#[repr(C)]
pub enum ExternResult<T> {
Ok(T),
Err(KernelError),
}
and all the allocate machinery can go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple error numbers never seem to be quite enough, and we'd inevitably end up having to rewire things to associate more information.
However, I have definitely wondered if it might be worth forcing engine to provide just one global extern error allocator, which we can then statically reference everywhere instead of passing it as an arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we provide that allocator in some kind of init
function? in general I like the idea of not needing to pass it everywhere, but don't like the footgun of needing to call init
first.
I think I'd vote to keep it this way for now, and if the passing of the allocator becomes burdensome in the connector code we can look at making it global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, if it weren't for unit tests (with "weird engine implementation warts" as a secondary concern) I'd probably just have allocate_error
be an undefined extern that the engine has to provide at link time. That would make it harder for unit tests to customize the error implementation (not sure we'd even need that), and if the engine is doing weird stuff where sometimes it wants different behaviors... also makes it hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to comments.
ffi/src/lib.rs
Outdated
/// | ||
/// The slice must be a valid (non-null) pointer, and must point to the indicated number of | ||
/// valid utf8 bytes. | ||
unsafe fn from_slice(slice: KernelStringSlice) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally using from_utf8
, but there are already so many other things that can go wrong in this unsafe code it seemed almost trite to return an ExternResult
, which greatly complicates call chains (**), just for this one detail?
(**) Creating an ExternResult
requires access to an AllocateError
.
ffi/src/lib.rs
Outdated
/// | ||
/// The slice must be a valid (non-null) pointer, and must point to the indicated number of | ||
/// valid utf8 bytes. | ||
unsafe fn from_slice(slice: KernelStringSlice) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, there are only three call sites for that method, only two of which change engine code, so I went ahead and updated them. LMK what you think?
Err(*mut EngineError), | ||
} | ||
|
||
type AllocateErrorFn = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simple error numbers never seem to be quite enough, and we'd inevitably end up having to rewire things to associate more information.
However, I have definitely wondered if it might be worth forcing engine to provide just one global extern error allocator, which we can then statically reference everywhere instead of passing it as an arg.
ffi/src/handle.rs
Outdated
} | ||
} | ||
|
||
mod uncreate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with "unconstructable" since that's the most accurate term for something that cannot be constructed?
ffi/kernel.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't check in the header files. They should be created as part of the build process and shipped as part of the build along with the kernel .a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few more comments. One high level suggestion as we push to get this merged:
can we split this into a few more modules. 700 odd lines is getting a bit... bulky :)
I think probably all the error stuff can go in its own module, maybe along with the string slice stuff, or that could be on its own.
Then lib.rs
should ideally just have the main "meat" functions that a connector calls.
ffi/src/lib.rs
Outdated
match std::str::from_utf8(slice) { | ||
Ok(s) => Ok(s.to_string()), | ||
Err(e) => Err(Error::generic_err(e)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a Utf8Error
#[from]
variant for our Error, so you should be able to do:
match std::str::from_utf8(slice) { | |
Ok(s) => Ok(s.to_string()), | |
Err(e) => Err(Error::generic_err(e)), | |
} | |
Ok(std::str::from_utf8(slice)?.to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error[E0277]: `?` couldn't convert the error to `deltakernel::Error`
--> ffi/src/lib.rs:108:38
|
108 | Ok(std::str::from_utf8(slice)?.to_string())
| ^ the trait `From<std::str::Utf8Error>` is not implemented for `deltakernel::Error`
|
= note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
= help: the following other types implement trait `From<T>`:
<deltakernel::Error as From<serde_json::error::Error>>
<deltakernel::Error as From<arrow_schema::error::ArrowError>>
<deltakernel::Error as From<url::ParseError>>
<deltakernel::Error as From<std::io::Error>>
<deltakernel::Error as From<parquet::errors::ParquetError>>
<deltakernel::Error as From<FromUtf8Error>>
<deltakernel::Error as From<std::num::ParseIntError>>
= note: required for `Result<std::string::String, deltakernel::Error>` to implement `FromResidual<Result<Infallible, std::str::Utf8Error>>`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just going to leave this as todo for now and we can revisit if we want? (also some discussion of panic instead of errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrmm, that's odd, but yeah, i think just unwrap
and get rid of all of this for now
|
||
#[repr(C)] | ||
#[derive(Debug)] | ||
pub enum KernelError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this (and the From
below) feel like something that should be derived via a proc macro. it's a bit tricky because we don't want it to live in the same module that we define our deltakernel::Error
variant. I suspect there's something we could do though. Probably fine to have it be future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flushing comments from our discussion
ffi/cffi-test.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expand this with tests that exercise the real FFI methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub unsafe extern "C" fn snapshot( | ||
path: KernelStringSlice, | ||
table_client: *const ExternEngineInterfaceHandle, | ||
) -> ExternResult<*const SnapshotHandle> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to fix this warning, given that rust only supports extern "C"
(not extern "C++"
):
warning: 'snapshot' has C-linkage specified, but returns incomplete type
'ExternResult<const SnapshotHandle *>'
which could be incompatible with C [-Wreturn-type-c-linkage]
ExternResult<const SnapshotHandle*> snapshot(KernelStringSlice path,
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[repr(C)] | ||
pub enum ExternResult<T> { | ||
Ok(T), | ||
Err(*mut EngineError), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantics: Kernel will always immediately return the leaked engine error to the engine (if it allocated one at all), and engine is responsible to free it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
|
||
// NOTE: We can't "just" impl From<DeltaResult<T>> because we require an error allocator. | ||
impl<T> ExternResult<T> { | ||
pub fn is_ok(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure would be handy to have access to these methods in C++ as well, but cbindgen doesn't seem to emit the member methods of a class.
#[repr(C)] | ||
pub struct EngineSchemaVisitor { | ||
// opaque state pointer | ||
data: *mut c_void, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use a formal handle type?
Or is it "one-shot" and so whocares?
Note: Kernel really doesn't care what type it is -- engine owns and manages it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seemed like we decided who cares?
state.inflight_expressions.take(exprid) | ||
} | ||
|
||
fn visit_expression_binary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could go to a support module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Caller is responsible to (at most once) pass a valid pointer returned by a call to | ||
/// [kernel_scan_files_init]. | ||
#[no_mangle] | ||
pub unsafe extern "C" fn kernel_scan_files_free(files: *mut KernelScanFileIterator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably be consistent with drop
vs. free
on engine side (probably the latter is more intuitive to non-rust code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment for now, lmk if we want this upgraded to an issue
…fault client and FFI wrappers for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened issues for pending items and did a tiny bit of cleanup to offload from ryan! had pretty extensive discussion, current state LGTM and we can merge then iterate :)
ffi/cffi-test.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ffi/src/lib.rs
Outdated
match std::str::from_utf8(slice) { | ||
Ok(s) => Ok(s.to_string()), | ||
Err(e) => Err(Error::generic_err(e)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error[E0277]: `?` couldn't convert the error to `deltakernel::Error`
--> ffi/src/lib.rs:108:38
|
108 | Ok(std::str::from_utf8(slice)?.to_string())
| ^ the trait `From<std::str::Utf8Error>` is not implemented for `deltakernel::Error`
|
= note: the question mark operation (`?`) implicitly performs a conversion on the error value using the `From` trait
= help: the following other types implement trait `From<T>`:
<deltakernel::Error as From<serde_json::error::Error>>
<deltakernel::Error as From<arrow_schema::error::ArrowError>>
<deltakernel::Error as From<url::ParseError>>
<deltakernel::Error as From<std::io::Error>>
<deltakernel::Error as From<parquet::errors::ParquetError>>
<deltakernel::Error as From<FromUtf8Error>>
<deltakernel::Error as From<std::num::ParseIntError>>
= note: required for `Result<std::string::String, deltakernel::Error>` to implement `FromResidual<Result<Infallible, std::str::Utf8Error>>`
ffi/src/lib.rs
Outdated
match std::str::from_utf8(slice) { | ||
Ok(s) => Ok(s.to_string()), | ||
Err(e) => Err(Error::generic_err(e)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just going to leave this as todo for now and we can revisit if we want? (also some discussion of panic instead of errors)
state.inflight_expressions.take(exprid) | ||
} | ||
|
||
fn visit_expression_binary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Caller is responsible to (at most once) pass a valid pointer returned by a call to | ||
/// [kernel_scan_files_init]. | ||
#[no_mangle] | ||
pub unsafe extern "C" fn kernel_scan_files_free(files: *mut KernelScanFileIterator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment for now, lmk if we want this upgraded to an issue
struct ExternEngineInterfaceVtable { | ||
// Actual table client instance to use | ||
client: Arc<dyn EngineInterface>, | ||
allocate_error: AllocateErrorFn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
oh and just opened #163 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! lgtm to merge then iterate. thanks!
Co-authored-by: Ryan Johnson <[email protected]>
Ability for engine to get a snapshot, traverse the schema, provide skipping predicates, and obtain the file list for scan.
Consumed by a duckdb extension (TODO: figure out how publish the extension so others can actually try this out!)
Not at all complete, but posting for ideas on how to improve it.
This took a lot of work vs. the previous hackathon because the ffi is now in a separate crate. In particular: