Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40488: [Swift] Add simple get swift example #41

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

abandy
Copy link

@abandy abandy commented Nov 6, 2024

This example uses Hummingbird for the http server and a URLSessionDataTask to pull the record batch down from the server.

@ianmcook
Copy link
Member

ianmcook commented Nov 6, 2024

Thanks for doing this @abandy!

Could you please remove the vendored Arrow package and set up Package.swift to install it from a URL? Edit: I see that this is not possible. Please see my suggested workaround below.

Could you also please add me (ianmcook) as a collaborator to your fork here so I can commit some changes to this PR (for example a readme file)?

@ianmcook
Copy link
Member

ianmcook commented Nov 6, 2024

Could you please remove the vendored Arrow package and set up Package.swift to install it from a URL?

Oh, I see now that this might not be possible, as discussed at apache/arrow#38872. Is there any available workaround that allows us to remove the vendored package? I have a strong preference not to have vendored libraries in these examples if it is at all possible.

@ianmcook
Copy link
Member

ianmcook commented Nov 6, 2024

@abandy Here's a workaround that I think is better than vendoring the full Swift package here: We can just provide instructions in a readme file for how to install it locally by running a few simple shell commands. Would that work OK?

@abandy
Copy link
Author

abandy commented Nov 7, 2024

Thanks for doing this @abandy!

Could you please remove the vendored Arrow package and set up Package.swift to install it from a URL? Edit: I see that this is not possible. Please see my suggested workaround below.

Could you also please add me (ianmcook) as a collaborator to your fork here so I can commit some changes to this PR (for example a readme file)?

@ianmcook No problem, glad I can help! I have added you as a collaborator.

The workaround makes sense. I will remove the vendor package and add a README with instructions. I hope to get these changes up by EOW.

@shaps80
Copy link

shaps80 commented Nov 12, 2024

Just a couple of questions if I may:

  1. Is the plan still to move to a dedicated repository (I know there are some questions around integration tests on this)
  2. What's the reason for the Alamofire dependency?

Amazing to see some progress on the Swift side, thanks!

@ianmcook
Copy link
Member

  1. Is the plan still to move to a dedicated repository (I know there are some questions around integration tests on this)

Yes, ultimately the plan is to move the examples that are in the http directory of this repo to a permanent place and add proper integration testing to ensure interoperability and catch any regressions. There's no clear timeframe for this right now. There is tension between "code that is written to pass CI" and "code that is written to serve as a useful example" and often in the Arrow community we have prioritized the former at the expense of the latter. At this stage with the Arrow-over-HTTP work I think it is important to prioritize the latter, but we will eventually have to figure out how to also do the former.

@abandy
Copy link
Author

abandy commented Nov 13, 2024

Just a couple of questions if I may:

  1. Is the plan still to move to a dedicated repository (I know there are some questions around integration tests on this)
  2. What's the reason for the Alamofire dependency?

Amazing to see some progress on the Swift side, thanks!

Hi @shaps80,

Good questions!

  1. I am guessing you mean the Apache arrow repo? If so, I think the conversation around moving it to it's own repo needs to be brought to the forefront again. It definitely would be easier to use if it was!
  2. I had started my example using Alamofire but then switched to URLSession (I missed removing this dependency :). Thanks for catching this!). Alamofire is a nice library and very mature but the URLSession.shared.dataTask was very straight forward!

@abandy abandy force-pushed the main branch 5 times, most recently from 24d8832 to 2ec7957 Compare November 13, 2024 02:08
@ianmcook
Copy link
Member

@shaps80 oops, sorry if I answered a different question than the one you were asking. I thought you were asking if these examples would be moved to a different repo. If you were asking about the Swift implementation, @abandy replied about that. Thanks.

@shaps80
Copy link

shaps80 commented Nov 13, 2024

@abandy thanks for that. Nice to hear we don't need any 3rd party dependencies 👍 (URLSession should suffice for 99% of cases these days)

I agree it'd be much better to move to an isolated repo as currently we can't integrate with SPM.

I'm currently working on an app for Geo where we need to store vast amounts of graph data and be able to query etc locally. So I'm keen to try Arrow for our needs, but I really need to get a better grasp of the concepts and the example so I'm not in a place to commit any contributions atm.

But if I can work it out and grasp the ideas, I'd be keen to help out in the future.

My main question atm, and I've love to get an example of this together myself, is how I can integrate with Arrow while still having idiomatic types that integrate well with SwiftUI.

I don't have time this week to explore but hopefully next week I can spend some time on this.

Thanks again!

@abandy
Copy link
Author

abandy commented Nov 14, 2024

@abandy thanks for that. Nice to hear we don't need any 3rd party dependencies 👍 (URLSession should suffice for 99% of cases these days)

I agree it'd be much better to move to an isolated repo as currently we can't integrate with SPM.

I'm currently working on an app for Geo where we need to store vast amounts of graph data and be able to query etc locally. So I'm keen to try Arrow for our needs, but I really need to get a better grasp of the concepts and the example so I'm not in a place to commit any contributions atm.

But if I can work it out and grasp the ideas, I'd be keen to help out in the future.

My main question atm, and I've love to get an example of this together myself, is how I can integrate with Arrow while still having idiomatic types that integrate well with SwiftUI.

I don't have time this week to explore but hopefully next week I can spend some time on this.

Thanks again!

Hi @shaps80, thank you! If you have any questions in regards to Apache Arrow Swift or need any assistance with using the API just let me know! If you have time to contribute that would be awesome and even getting feedback on any issues you might have when using the API would be very useful.

There is an Arrow Encoder and Decoder included in the Swift arrow repo that converts to and from a Swift object (there are restrictions on data types). This might help with moving data between Arrow and the UI.

I have written an open source native Swift Apache Arrow Query Engine if you are looking to run SQL from Swift. It doesn't contain the full set of SQL functionality but it might be useful. It can be found here: https://github.com/abandy/swiftqe. (I am hoping to add subqueries by end of December)

@ianmcook
Copy link
Member

@abandy could you please split this example into a separate server (in http/get_simple/swift/server) and client (in http/get_simple/swift/client) like the other examples here? One of the goals of the examples here is to test interoperability between the different servers and clients, and to do that we need them to be separate. Thanks.

@abandy
Copy link
Author

abandy commented Nov 15, 2024

@abandy could you please split this example into a separate server (in http/get_simple/swift/server) and client (in http/get_simple/swift/client) like the other examples here? One of the goals of the examples here is to test interoperability between the different servers and clients, and to do that we need them to be separate. Thanks.

Hi @ianmcook, gotcha, will do. I will try to get this update in by EOW.

@abandy abandy force-pushed the main branch 2 times, most recently from c422e9a to e7fbbfc Compare November 19, 2024 01:55
switch arrowWriter.toStream(writerInfo) {
case .success(let writeData):
print("sending recordBatchs: \(recordBatchs.count)")
return ByteBuffer(data: writeData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the Content-Type response header to the appropriate value.

Suggested change
return ByteBuffer(data: writeData)
return .init(
status: .ok, headers: [.contentType: "application/vnd.apache.arrow.stream"],
body: .init(byteBuffer: buffer))

}

let router = Router()
router.get("/") { request, _ -> ByteBuffer in
Copy link
Member

@ianmcook ianmcook Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other suggested change requires this change also.

Suggested change
router.get("/") { request, _ -> ByteBuffer in
router.get("/") { request, _ -> Response in

@ianmcook
Copy link
Member

@abandy thanks for splitting up the client and server.

I am able to get the Swift client running with the Swift server. But when I try to run the Swift client with any of the other server examples, I see this error:

Arrow/ArrowReader.swift:234: Fatal error: Unexpectedly found nil while unwrapping an Optional value

Could you please test this with one of the other server examples here? Thanks

@ianmcook
Copy link
Member

It seems like the response that the Swift server is serving is not a valid Arrow IPC stream. Here's what I tried to do:

  1. Start the Swift server.
  2. Use curl to connect to the server and download the data to a file:
curl -o file.arrows http://127.0.0.1:8081
  1. In Python, try to read the file with PyArrow:
import pyarrow as pa

with open('file.arrows', 'rb') as f:
    reader = pa.ipc.open_stream(f)
    reader.schema
    reader.read_next_batch()
    reader.read_next_batch()

When I do this, PyArrow raises the error:

Invalid flatbuffers message.

Do you know what's causing this? Thanks

@abandy
Copy link
Author

abandy commented Nov 22, 2024

@abandy thanks for splitting up the client and server.

I am able to get the Swift client running with the Swift server. But when I try to run the Swift client with any of the other server examples, I see this error:

Arrow/ArrowReader.swift:234: Fatal error: Unexpectedly found nil while unwrapping an Optional value

Could you please test this with one of the other server examples here? Thanks

Hi Ian,

It looks like the reader is failing while trying to read the schema from the footer. I believe the footer should always have a schema (AFAIK). Maybe a flatbuffer issue (I know Swift Arrow currently targets a specific version of flatbuffers as the latest was causing an issue, could be related(?)). I will look into this.

@abandy
Copy link
Author

abandy commented Nov 22, 2024

It seems like the response that the Swift server is serving is not a valid Arrow IPC stream. Here's what I tried to do:

  1. Start the Swift server.
  2. Use curl to connect to the server and download the data to a file:
curl -o file.arrows http://127.0.0.1:8081
  1. In Python, try to read the file with PyArrow:
import pyarrow as pa

with open('file.arrows', 'rb') as f:
    reader = pa.ipc.open_stream(f)
    reader.schema
    reader.read_next_batch()
    reader.read_next_batch()

When I do this, PyArrow raises the error:

Invalid flatbuffers message.

Do you know what's causing this? Thanks

I know there was an issue with the latest version of flatbuffers and Swift Arrow. Swift Arrow is bound to a specific version. Could be this is causing the issue. I will look into it.

@abandy
Copy link
Author

abandy commented Dec 2, 2024

@ianmcook I have opened a bug in the apache repo for tracking this bug (apache/arrow#44910). I think we should be able to merge this PR and then follow up with the inconsistent behavior issue with the bug?

-->

# HTTP GET Arrow Data: Simple Swift Client Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> [!CAUTION]
> This Swift client example is compatible _only_ with the Swift server example. It is incompatible with all the other server examples in the `get_simple` directory of this repository. This is because of a problem in how the Swift Arrow package implements the Arrow IPC stream format (https://github.com/apache/arrow/issues/44910).

-->

# HTTP GET Arrow Data: Simple Swift Server Example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> [!CAUTION]
> This Swift server example is compatible _only_ with the Swift client example. It is incompatible with all the other client examples in the `get_simple` directory of this repository. This is because of a problem in how the Swift Arrow package implements the Arrow IPC stream format (https://github.com/apache/arrow/issues/44910).

@ianmcook
Copy link
Member

ianmcook commented Dec 3, 2024

@ianmcook I have opened a bug in the apache repo for tracking this bug (apache/arrow#44910). I think we should be able to merge this PR and then follow up with the inconsistent behavior issue with the bug?

My preference is to keep this open while you develop a fix for the bug. If you'd really like it to be merged, I'm OK with that provided that we can include the warnings I've suggested in the README files.

@abandy
Copy link
Author

abandy commented Dec 3, 2024

@ianmcook sounds good. I have implemented a fix and am testing it now. I need to do some refactoring and hope to have a PR out in a couple of days. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants