Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer reads from package tarfiles #1840

Merged
merged 1 commit into from
May 9, 2019

Conversation

rbtcollins
Copy link
Contributor

@rbtcollins rbtcollins commented May 9, 2019

Buffer reads from package tarfiles

This has variable impact on extract - when plenty of buffer cache is
available no impact; but in low-cache situations this avoids contention
as rust-docs writes several hundred MB but needs read only 11MB of
archive. I've measured this at between 0 seconds and 12 seconds of
improvement accordingly, depending on the scenario. The buffer size of
8MB is chosen to be twice the block size of the largest SSD around
today, so hopefully triggering read-ahead, which can matter if our
rchives were to increase in size. 4MB could be chosen, or an adaptive
buffer used, if we need to run on embedded devices.

It may be still better to never write these tarfiles to disk,
but that is a substantially larger change and I'm confident enough that
this is beneficial to propose it now.

In order to do this either an additional handle was required, or we
could simplify FileReaderWithProgress length handling - sent_start was
never being toggled: we were notifying the download tracker on every
read that hit the progress adapter. Rather than having that logic in
read(), move it to new_file, as that is a more obvious place to have it
anyway.

This has variable impact on extract - when plenty of buffer cache is
available no impact; but in low-cache situations this avoids contention
as rust-docs writes several hundred MB but needs read only 11MB of
archive. I've measured this at between 0 seconds and 12 seconds of
improvement accordingly, depending on the scenario. The buffer size of
8MB is chosen to be twice the block size of the largest SSD around
today, so hopefully triggering read-ahead, which can matter if our
archives were to increase in size. 4MB could be chosen, or an adaptive
buffer used, if we need to run on embedded devices.

It may be still better to never write these tarfiles to disk,
but that is a substantially larger change and I'm confident enough that
this is beneficial to propose it now.

In order to do this either an additional handle was required, or we
could simplify FileReaderWithProgress length handling - sent_start was
never being toggled: we were notifying the download tracker on every
read that hit the progress adapter. Rather than having that logic in
read(), move it to new_file, as that is a more obvious place to have it
anyway.
@rbtcollins
Copy link
Contributor Author

(I tried to separate out the two logical changes, but really one has to be done to enable the other(

Copy link
Contributor

@kinnison kinnison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm sad I didn't think of using a BufReader when I first wrote this :D

@kinnison kinnison merged commit 0baefed into rust-lang:master May 9, 2019
@rbtcollins
Copy link
Contributor Author

LGTM, I'm sad I didn't think of using a BufReader when I first wrote this :D

Until you've climbed deep enough into the std lib to know how little buffering there is by default, its probably unobvious that one would be needed: in Python for instance the default is to be buffered...

@rbtcollins rbtcollins deleted the buffer-package-reads branch May 9, 2019 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants