-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37789: [Integration][Go] Go C Data Interface integration testing #37788
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
@zeroshade It would be nice if you could take a look at this. This is draft as my Go skills are almost non-existent :-) |
@pitrou I'll definitely take a look at this today. thanks for drafting it out in the first place, I was planning on doing this myself after i saw your other PR get merged! |
|
3d145ff
to
365abee
Compare
} | ||
|
||
func newJsonReader(cJsonPath *C.char) (*arrjson.Reader, error) { | ||
jsonPath := C.GoString(cJsonPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade I don't really understand our Go coding style. Should I use camelCase
or snake_case
for variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the general style for go is camelCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for confirming my intuition. Is there any way to enforce that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if there's an option i'm missing in the linter workflow that would enforce it
365abee
to
d2eceb5
Compare
@zeroshade This is ready for review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM just some stylistic nit picks
go_lib="arrow_go_integration.so" | ||
;; | ||
Darwin) | ||
go_lib="arrow_go_integration.so" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use .dylib
for mac?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahah, yes, we might!
Though, admittedly this entire condition is a bit pedantic as the library is only loaded explicitly using its full pathname :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, though looking at my existing cdata/test/test_export_to_cgo.py
it looks like I do the same thing as you're doing here. I guess the default output name is still .so
on mac unless you provide a -o
option to give it the dylib explicitly. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, according to https://stackoverflow.com/a/29226313, there's a theoretical difference between the two which doesn't seem to matter in practice:
The difference between .dylib and .so on mac os x is how they are compiled. For .so files you use -shared and for .dylib you use -dynamiclib. Both .so and .dylib are interchangeable as dynamic library files and either have a type as DYLIB or BUNDLE. [...] The reason the two are equivalent on Mac OS X is for backwards compatibility with other UNIX OS programs that compile to the .so file type.
# Note: the Arrow Go C Data export functions expect their output | ||
# ArrowStream or ArrowArray argument to be zero-initialized. | ||
# This is currently ensured through the use of `ffi.new`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, i thought trampoline.c
zero-initialized everything but you're right, we're only trampolining to zero-initialize for streamGetSchema
and streamGetNext
, darn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what trampoline.c
is, but I directly call into CGo-exported functions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See apache/arrow-adbc#729 for the full discussion, but essentially the callbacks that were populated into the C Data ArrowArrayStream
struct for getting the schema and getting the next record batch had the same assumption that the output arguments were zero-initialized. So @lidavidm created trampoline.c
which essentially provided wrappers around the Go exported streamGetSchema
and streamGetNext
function pointers, and we use the trampoline methods as the function pointers we use (see exportStream
in cdata/exports.go
)
But like i said, it looks like we only did that for the function pointers we set for the callbacks in the ArrowArrayStream
struct and not for the other exported functions for regular import/export.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to apply the same protections for the rest.
Notably, Arrow-C++ does NOT zero-initialize structs before using them so we should try to reflect reality here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I was just thinking it should be done in a different PR than this one, unless @pitrou feels like adding the same protections to the existing exported funcs here 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, let's defer this to another issue and PR.
defaulttz := "UTC" | ||
defaulttz := "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this incorrect by the spec? I thought I pulled this from the c bridge which had the same default? (I could be misremembering though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was incorrect and failing the integration tests between C++ and Go.
A timestamp with an missing timezone is not the same as a timestamp with the UTC timezone, as explained in the format spec:
Line 264 in 772a01c
/// Timestamp is a 64-bit signed integer representing an elapsed time since a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, must have mistaken that when I originally coded this 😦
bf9afb9
to
d6dd8bb
Compare
The CI failures are unrelated. The Go and integration tests are all green, and I believe I've addressed all your comments @zeroshade . Is it ok if I merge this? |
Sorry for not responding yesterday, I was out sick. But this all looked good to me! Thanks! |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit cc51e68. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ting (apache#37788) ### Rationale for this change We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability. ### What changes are included in this PR? 1. Enable C Data Interface integration testing for the Arrow Go implementation 2. Fix compatibility issues found by the integration tests ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? Bugfixes in the Arrow Go C Data Interface implementation. * Closes: apache#37789 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ting (apache#37788) ### Rationale for this change We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability. ### What changes are included in this PR? 1. Enable C Data Interface integration testing for the Arrow Go implementation 2. Fix compatibility issues found by the integration tests ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? Bugfixes in the Arrow Go C Data Interface implementation. * Closes: apache#37789 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ting (apache#37788) ### Rationale for this change We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability. ### What changes are included in this PR? 1. Enable C Data Interface integration testing for the Arrow Go implementation 2. Fix compatibility issues found by the integration tests ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? Bugfixes in the Arrow Go C Data Interface implementation. * Closes: apache#37789 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
…ting (apache#37788) ### Rationale for this change We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability. ### What changes are included in this PR? 1. Enable C Data Interface integration testing for the Arrow Go implementation 2. Fix compatibility issues found by the integration tests ### Are these changes tested? Yes, by construction. ### Are there any user-facing changes? Bugfixes in the Arrow Go C Data Interface implementation. * Closes: apache#37789 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
We want to enable integration testing of the Arrow Go implementation of the C Data Interface, so as to ensure interoperability.
What changes are included in this PR?
Are these changes tested?
Yes, by construction.
Are there any user-facing changes?
Bugfixes in the Arrow Go C Data Interface implementation.