-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use IO.copy_stream when possible #383
base: master
Are you sure you want to change the base?
Conversation
@@ -651,8 +651,8 @@ def redirect_uri_callback=(redirect_uri_callback) | |||
# use get method. get returns HTTP::Message as a response and you need to | |||
# follow HTTP redirect by yourself if you need. | |||
def get_content(uri, *args, &block) | |||
query, header = keyword_argument(args, :query, :header) | |||
success_content(follow_redirect(:get, uri, query, nil, header || {}, &block)) | |||
query, header, to = keyword_argument(args, :query, :header, :to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do better in term of API. But it seemed logical to expose this optimized code path through get_content
only.
So: client.get_content('/big-file.bin', to: '/tmp/big-file.bin')
@inflater = inflater | ||
end | ||
|
||
def write(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something that is not very well documented, is that copy_steam
do accept fake IO objects as long as they respond to #write()
. However it's kind of a fallback codepath, as it won't be able to use sendfile()
so there will be no speed up.
@@ -829,16 +845,6 @@ def test_get_with_block_arity_2_and_redirects | |||
assert_nil(res.content) | |||
end | |||
|
|||
def test_get_with_block_string_recycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel bad about removing that test, but unfortunately read_block_size
doesn't make any sense in case sendfile()
is used.
But the next test, test the same behavior with chunked response, so I think it's okish to remove it.
ea04649
to
5e0d02c
Compare
Fix: #66
Context
Ref: googleapis/google-cloud-ruby#1897
We noticed that Google Cloud Storage's ruby library performance on download was heavily impacted by CPU usage on the host, especially for big files. After some digging it was clear it's due to how the data has to transit through
read()
andwrite()
instead of leveragingsendfile()
.An experiment using a quick and dirty patch showed a reduction from
15s
to5s
for a 500MB download.The patch
To leverage
sendfile()
in ruby, the best and simplest API isIO.copy_stream
as suggested in #66.The problem is that
copy_stream
need IO or IO like objects to work with, andhttpclient
's API mostly deal with blocks, so I had to adapt the API somehow.One important thing to note, is that we can only leverage
sendfile
if there is no modifications to apply on the request body, e.g. no chunking, no compression.I'll add comments on specific parts of the patch in a later comments.