-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write build artifacts to (cloud) storage from build servers #5549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems simple enough at first glance. I'm more worried about figuring out how to get azure to actually accept all our uploaded files, which has seemed tricky thus far :/
localmedia=bool(outcomes['localmedia']), | ||
pdf=bool(outcomes['pdf']), | ||
epub=bool(outcomes['epub']), | ||
) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this else
on the try
instead of the if
now? I can never remember what try/else
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. An else
on a try
is taken if there is no exception. I refactored that slightly because we defined build_id
in a try
block and then use it later. The only reason it wasn't ever a NameError
is that we reraise the error in both except
blocks.
I have a couple ideas here which I would like to try. Firstly, when overwriting a file, we can use |
I made a first pass at writing everything to storage. Testing this outTo try this out you need to have a setting:
In the above example, the Notes
After digging deeper, we want to use
I didn't implement this yet although it should be rather easy in the new setup. I suspect that this problem might have been us calling |
- Useful for writing build artifacts under MEDIA_ROOT in dev
I added a
With that setting, all build artifacts will be written under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good (and simpler). Definitely a cleaner approach, and with much less copying between our own machines :)
Tested locally with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm +1 on shipping this with it turned on in -ops
, so that we're writing everything to Azure. It will continue to write to the builders, which is good.
We probably need to find a way to configure the .org to not copy the media files but continue to have the .com do it properly. I'd be 👍 with turning off copying to the webs of pdf/epub/htmlzip.
I looked into this briefly and it may be a little tricky because the code to index files ( |
- Use the RTD_ prefix - Assume that settings.RTD_BUILD_MEDIA_STORAGE is set (defined in base)
I believe this PR is mergeable although we can't completely disable syncers yet (locally or in prod).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to ship this. I do think we need to be a bit more defensive here, to make sure we aren't letting exceptions break the prod code paths.
localmedia=bool(outcomes['localmedia']), | ||
pdf=bool(outcomes['pdf']), | ||
epub=bool(outcomes['epub']), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We likely need to try/except this, so that if it fails, we still run the old syncers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, but I think I'll handle this try/except block in the function around the actual storage writes which could throw errors.
localmedia=bool(outcomes['localmedia']), | ||
pdf=bool(outcomes['pdf']), | ||
epub=bool(outcomes['epub']), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this should probably be a try/except as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method which only had code removed shouldn't really throw an exception randomly. Do you really think we need a try block for it?
readthedocs/projects/tasks.py
Outdated
msg=f'Writing {media_type} to media storage - {to_path}', | ||
), | ||
) | ||
storage.copy_directory(from_path, to_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This likely should be try/except
as well, so when it fails it doesn't stop all the other formats from syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a try block around the writes/deletes to storage just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these changes. Left some small comments to consider.
Setting
RTD_BUILD_MEDIA_STORAGE
in production will write all build artifacts to storage (HTML, PDFs, etc.) but we still need a syncer. This will result in PDFs being written to the webs again but this is a temporary state until we proxy HTML requests to storage.
Not sure to follow here. Why PDF will be written again to webs? Because of this line?
Depending on when we want to shrink our disk, I'd say that this is OK. Although, if we want to shrink them soon, we could add a setting to disable copying PDF/ePub into the storage in production. Actually, no need to define a new setting, we could just check for the BUILD_MEDIA_STORAGE
and if it's Azure, do not copy them to webs.
else: | ||
types_to_delete.append('epub') | ||
|
||
for media_type, build_type in types_to_copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
This is a little confusing to me. What is media_type
and build_type
? I suppose the first one is the path name where we want to save it, and the later is the path name from the builder.
If so, where they come from? Isn't it possible to use a constant or get the name from the related class where this is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree but I believe this is a larger refactor and should be handled in a separate PR. These constants are not defined somewhere convenient (like readthedocs/doc_builder/constants.py
) and they are used in quite a few places outside of the scope of this PR like the doc builder backends themselves.
readthedocs/projects/tasks.py
Outdated
except Exception: | ||
# Ideally this should just be an IOError | ||
# but some storage backends unfortunately throw other errors | ||
log.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have we are duplicating the copy a warning could be enough. Although, I think we want to log an .exception
here so we can see it under Sentry and take care of it. Otherwise, if sync fails for any reason we won't know that and we will see random issues on webpages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, .exception
could generate a flood of notifications here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a log.warning
with exc_info=True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what I want :)
My concern is that if this fail we won't be alarmed/notified at all. Although, alarming us for every single file that was not able to be copied is crazy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just do a log.exception
and Sentry can sort it out. It looks like Sentry is smart enough to find similar issues even when the text isn't exactly the same.
requirements/pip.txt
Outdated
@@ -97,5 +97,8 @@ django-cors-middleware==1.3.1 | |||
# User agent parsing - used for analytics purposes | |||
user-agents<1.2.0 | |||
|
|||
# Utilities used to upload build media to cloud storage | |||
django-storages>=1.7,<1.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to pin to an exact version our dependencies so the environment is reproducible. You can use something like to allow weekly updates via our bot, but forcing <1.8
:
django-storages==1.7.1 # pyup: <1.8
|
Goals
Considerations
Down the road
For this PR, I think we'll keep the code that "syncs" files to the webs as well as adding code to write to "storage". Once we are happy with serving things from blob storage via reverse proxy, we will no longer need to
move_files
orsync_files
whatsoever and the syncers/pullers could completely go away. The web servers won't need much in the way of disks attached to them (both simpler and cheaper web servers). They could be completely ephemeral and potentially autoscaled.update_app_instances
could then simply be the API call to the webs to update the database that the build finished.