Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: fix race on download error. Fixes #5047 #5665

Closed
wants to merge 1 commit into from

Conversation

buchgr
Copy link
Contributor

@buchgr buchgr commented Jul 24, 2018

For downloading output files / directories we trigger all
downloads concurrently and asynchronously in the background
and after that wait for all downloads to finish. However, if
a download failed we did not wait for the remaining downloads
to finish but immediately started deleting partial downloads
and continued with local execution of the action.

That leads to two interesting bugs:

  1. The cleanup procedure races with the downloads that are still
    in progress. As it tries to delete files and directories, new
    files and directories are created and that will often
    lead to "Directory not empty" errors as seen in builds with remote caching sometimes fails on incomplete downloads #5047.
  2. The clean up procedure does not detect the race, succeeds and
    subsequent local execution fails because not all files have
    been deleted.

For downloading output files / directories we trigger all
downloads concurrently and asynchronously in the background
and after that wait for all downloads to finish. However, if
a download failed we did not wait for the remaining downloads
to finish but immediately started deleting partial downloads
and continued with local execution of the action.

That leads to two interesting bugs:
 1) The cleanup procedure races with the downloads that are still
    in progress. As it tries to delete files and directories, new
    files and directories are created and that will often
    lead to "Directory not empty" errors as seen in bazelbuild#5047.
 2) The clean up procedure does not detect the race, succeeds and
    subsequent local execution fails because not all files have
    been deleted.
@buchgr
Copy link
Contributor Author

buchgr commented Jul 25, 2018

merged as bca1912

@buchgr buchgr closed this Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants