Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple Ruby client example #19

Merged
merged 12 commits into from
Mar 14, 2024
Merged

Conversation

amoeba
Copy link
Member

@amoeba amoeba commented Mar 13, 2024

Fixes apache/arrow#40478.
Goes with open PR for Ruby server example #17.

Hi @kou, do you want to have a look? These Ruby APIs aren't familiar to me and I haven't used red-arrow before so I imagine this can be improved. The second example is nonsense right now and doesn't work but I'll look at this again tomorrow. Edit: With a fresh set of eyes, I think I fixed the streaming example, or at least made it run.

So far I've just tested this against the Ruby server PR and not others.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll add a comment about streaming approach later.)

http/get_simple/ruby/client/Gemfile.lock Outdated Show resolved Hide resolved
http/get_simple/ruby/client/client.rb Show resolved Hide resolved
http/get_simple/ruby/client/client.rb Outdated Show resolved Hide resolved
http/get_simple/ruby/client/client.rb Outdated Show resolved Hide resolved
@kou
Copy link
Member

kou commented Mar 13, 2024

We need to implement arrow::ipc::StreamDecoder bindings for streaming case. arrow::ipc::StreamDecoder is used by C++ client example: https://github.com/apache/arrow-experiments/blob/main/http/get_simple/cpp/client/client.cpp

Because Net::HTTP provides read chunks to callback. It doesn't provide blocking read API. So we can't use Arrow::RecordBatchStreamReader that requires blocking read API. We need to use arrow::ipc::StreamDecoder for callback style read.

@kou
Copy link
Member

kou commented Mar 13, 2024

For arrow::ipc::StreamDecoder binding: apache/arrow#40493

@amoeba
Copy link
Member Author

amoeba commented Mar 13, 2024

Thanks for the review @kou. How about I remove the streaming example for now?

@ianmcook
Copy link
Member

@amoeba if you remove the streaming example, can you copy it into a Gist and we can link to that from the README for future reference?

@amoeba
Copy link
Member Author

amoeba commented Mar 13, 2024

Sure. I put it in a gist. While it works, I don't think it actually streams (I think it buffers the entire response in memory before moving on) but it's there.

@ianmcook
Copy link
Member

Ok, thanks! I'll leave it to you to link it from the readme (or not) if you think it's worth it.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the current streaming implementation and merge this.

BTW, the current implementation isn't streaming because res.read_body blocks until all response data received. We need to rewrite it entirely. So I think that we don't need to refer the Gist from README.

http/get_simple/ruby/client/.gitignore Outdated Show resolved Hide resolved
http/get_simple/ruby/client/client.rb Outdated Show resolved Hide resolved
@amoeba
Copy link
Member Author

amoeba commented Mar 14, 2024

Thanks @kou.

BTW, the current implementation isn't streaming because res.read_body blocks until all response data received. We need to rewrite it entirely. So I think that we don't need to refer the Gist from README.

Yep, I agree. I should've made that more clear above. I just merged your changes and am going to give them a quick test.

@amoeba
Copy link
Member Author

amoeba commented Mar 14, 2024

Thanks @ianmcook. This is good to go now. I tested locally and get this when I run it at the recently-merged server example:

$ bundle exec ruby client.rb
24415 record batches received
3.51 seconds elapsed

Copy link
Member

@ianmcook ianmcook left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amoeba and @kou!

I successfully tested this against all the server examples.

While testing, I temporarily added this code at the bottom to write the table to a file:

output_file_path = "output-ruby.arrow"
output_stream = Arrow::FileOutputStream.new(output_file_path, false)
writer = Arrow::RecordBatchFileWriter.new(output_stream, schema)
writer.write_table(table)
writer.close
output_stream.close

I confirmed that all the resulting files were valid.

@ianmcook ianmcook merged commit 9ba7081 into apache:main Mar 14, 2024
@kou
Copy link
Member

kou commented Mar 15, 2024

While testing, I temporarily added this code at the bottom to write the table to a file:

output_file_path = "output-ruby.arrow"
output_stream = Arrow::FileOutputStream.new(output_file_path, false)
writer = Arrow::RecordBatchFileWriter.new(output_stream, schema)
writer.write_table(table)
writer.close
output_stream.close

FYI: We can write it as only table.save("output-ruby.arrows"). :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ruby] Create simple example of Ruby HTTP GET Arrow client
3 participants