Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workspace download retries with read timeouts #8974

Closed
keith opened this issue Jul 24, 2019 · 3 comments
Closed

Workspace download retries with read timeouts #8974

keith opened this issue Jul 24, 2019 · 3 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug

Comments

@keith
Copy link
Member

keith commented Jul 24, 2019

It seems like currently if bazel fails to download something from the WORKSPACE that has multiple URLs defined, it doesn't always fallback to the other URL. For example I hit this error:

Starting local Bazel server and connecting to it...
 - /root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl:93:5
 - /root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl:261:13
 - /tmp/share/code/WORKSPACE:83:1
ERROR: An error occurred during the fetch of repository 'go_sdk':
   Traceback (most recent call last):
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 78
		_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 149, in _remote_sdk
		ctx.download(url = urls, sha256 = sha256, output ...")
java.io.IOException: Error downloading [https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/go_sdk/go_sdk.tar.gz: Read timed out
ERROR: While resolving toolchains for target @com_github_bazelbuild_buildtools//buildifier:buildifier: invalid registered toolchain '@go_sdk//:go_android_amd64': no such package '@go_sdk//': Traceback (most recent call last):
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 78
		_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 149, in _remote_sdk
		ctx.download(url = urls, sha256 = sha256, output ...")
java.io.IOException: Error downloading [https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/go_sdk/go_sdk.tar.gz: Read timed out
ERROR: Analysis of target '//:buildifier.check' failed; build aborted: invalid registered toolchain '@go_sdk//:go_android_amd64': no such package '@go_sdk//': Traceback (most recent call last):
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 78
		_remote_sdk(ctx, [url.format(filename) for url...], <2 more arguments>)
	File "/root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/io_bazel_rules_go/go/private/sdk.bzl", line 149, in _remote_sdk
		ctx.download(url = urls, sha256 = sha256, output ...")
java.io.IOException: Error downloading [https://dl.google.com/go/go1.12.7.linux-amd64.tar.gz] to /root/.cache/bazel/_bazel_root/c99f8fb4c845b3aae6d69f1a3c75aa35/external/go_sdk/go_sdk.tar.gz: Read timed out
ERROR: Build failed. Not running target

When my WORKSPACE contained:

http_archive(
    name = "io_bazel_rules_go",
    sha256 = "8df59f11fb697743cbb3f26cfb8750395f30471e9eabde0d174c3aebc7a1cd39",
    urls = [
        "https://storage.googleapis.com/bazel-mirror/github.com/bazelbuild/rules_go/releases/download/0.19.1/rules_go-0.19.1.tar.gz",
        "https://github.com/bazelbuild/rules_go/releases/download/0.19.1/rules_go-0.19.1.tar.gz",
    ],
)

I would have expected this to fallback to the GitHub URL before failing.

What's the output of bazel info release?

release 0.28.0

@jin jin added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. untriaged labels Jul 24, 2019
@laurentlb laurentlb added P2 We'll consider working on this in future. (Assignee optional) type: bug and removed untriaged labels Jul 29, 2019
@artem-zinnatullin
Copy link
Contributor

I've submitted two fixes for this issue:


I also think HttpDownloader, HttpConnector and HttpConnectorMultiplexer based on HttpUrlConnection should be deprecated in favor of more modern non-blocking Netty based code — HttpDownloadHandler, potentially would need to be abstracted out so both repository downloads and blob downloads could reuse it.

Would be happy to work on this if we achieve consensus 😊

@buchgr
Copy link
Contributor

buchgr commented Jul 31, 2019

Would be happy to work on this if we achieve consensus blush

We wanted to do this for a while. It's just been that we never got around to do it. Feel free to send patches!

@artem-zinnatullin
Copy link
Contributor

Great!

Would it be ok to open a separate issue to outline the HttpUrlConnectionNetty migration details and track its progress?


Can you please consider #9015 and #9008 in the mean time as they significantly improve resiliency of repository downloads? 😊

bazel-io pushed a commit that referenced this issue Aug 5, 2019
As part of investigation of #8974 I found that recent change 982e0b8 broke retries of `SocketTimeoutException` in `HttpConnector`, intention of that change was good but lack of tests resulted in broken logic.

IntelliJ highlights the problem ? branch with `instanceof SocketTimeoutException` would have never been executed:

<img width="764" alt="Screen Shot 2019-07-29 at 3 26 11 PM" src="https://user-images.githubusercontent.com/967132/62089179-d7bfa400-b21c-11e9-882a-1c1fe1fcb683.png">

---

This PR adds missing tests and fixes the logic to still present `SocketTimeoutException` as `IOException` for upstream consumers while handling it properly internally in the `HttpConnector`.

Closes #9008.

PiperOrigin-RevId: 261675244
katre pushed a commit that referenced this issue Aug 6, 2019
As part of investigation of #8974 I found that recent change 982e0b8 broke retries of `SocketTimeoutException` in `HttpConnector`, intention of that change was good but lack of tests resulted in broken logic.

IntelliJ highlights the problem ? branch with `instanceof SocketTimeoutException` would have never been executed:

<img width="764" alt="Screen Shot 2019-07-29 at 3 26 11 PM" src="https://user-images.githubusercontent.com/967132/62089179-d7bfa400-b21c-11e9-882a-1c1fe1fcb683.png">

---

This PR adds missing tests and fixes the logic to still present `SocketTimeoutException` as `IOException` for upstream consumers while handling it properly internally in the `HttpConnector`.

Closes #9008.

PiperOrigin-RevId: 261675244
katre pushed a commit that referenced this issue Aug 6, 2019
Fixes #8974.

`HttpDownloader` never retried `IOException` that could have occurred during `ByteStreams.copy(payload, out)`, HttpConnector would have retried connection phase errors but not payload ones.

This chageset adds fallback to next url(s) if present for both payload read errors and connection errors and adds (completely) missing tests for `HttpDownloader`.

Note, that this PR technically disables questionable `HttpConnectorMultiplexer` optimization that attempts to connect to multiple urls at the same time (by starting threads that race with each other) and picking the one that connected faster. There is a way to keep that optimization while falling back to next urls, but it would require all exceptions to contain url that caused it so that `HttpDownloader` could retry download on other urls. I think making `HttpDownloader` more reliable and actually using provided `urls` as fallback is much more important than mentioned optimization.

Closes #9015.

PiperOrigin-RevId: 261702678
katre pushed a commit that referenced this issue Aug 7, 2019
As part of investigation of #8974 I found that recent change 982e0b8 broke retries of `SocketTimeoutException` in `HttpConnector`, intention of that change was good but lack of tests resulted in broken logic.

IntelliJ highlights the problem ? branch with `instanceof SocketTimeoutException` would have never been executed:

<img width="764" alt="Screen Shot 2019-07-29 at 3 26 11 PM" src="https://user-images.githubusercontent.com/967132/62089179-d7bfa400-b21c-11e9-882a-1c1fe1fcb683.png">

---

This PR adds missing tests and fixes the logic to still present `SocketTimeoutException` as `IOException` for upstream consumers while handling it properly internally in the `HttpConnector`.

Closes #9008.

PiperOrigin-RevId: 261675244
katre pushed a commit that referenced this issue Aug 7, 2019
Fixes #8974.

`HttpDownloader` never retried `IOException` that could have occurred during `ByteStreams.copy(payload, out)`, HttpConnector would have retried connection phase errors but not payload ones.

This chageset adds fallback to next url(s) if present for both payload read errors and connection errors and adds (completely) missing tests for `HttpDownloader`.

Note, that this PR technically disables questionable `HttpConnectorMultiplexer` optimization that attempts to connect to multiple urls at the same time (by starting threads that race with each other) and picking the one that connected faster. There is a way to keep that optimization while falling back to next urls, but it would require all exceptions to contain url that caused it so that `HttpDownloader` could retry download on other urls. I think making `HttpDownloader` more reliable and actually using provided `urls` as fallback is much more important than mentioned optimization.

Closes #9015.

PiperOrigin-RevId: 261702678
katre pushed a commit that referenced this issue Aug 12, 2019
As part of investigation of #8974 I found that recent change 982e0b8 broke retries of `SocketTimeoutException` in `HttpConnector`, intention of that change was good but lack of tests resulted in broken logic.

IntelliJ highlights the problem ? branch with `instanceof SocketTimeoutException` would have never been executed:

<img width="764" alt="Screen Shot 2019-07-29 at 3 26 11 PM" src="https://user-images.githubusercontent.com/967132/62089179-d7bfa400-b21c-11e9-882a-1c1fe1fcb683.png">

---

This PR adds missing tests and fixes the logic to still present `SocketTimeoutException` as `IOException` for upstream consumers while handling it properly internally in the `HttpConnector`.

Closes #9008.

PiperOrigin-RevId: 261675244
katre pushed a commit that referenced this issue Aug 12, 2019
Fixes #8974.

`HttpDownloader` never retried `IOException` that could have occurred during `ByteStreams.copy(payload, out)`, HttpConnector would have retried connection phase errors but not payload ones.

This chageset adds fallback to next url(s) if present for both payload read errors and connection errors and adds (completely) missing tests for `HttpDownloader`.

Note, that this PR technically disables questionable `HttpConnectorMultiplexer` optimization that attempts to connect to multiple urls at the same time (by starting threads that race with each other) and picking the one that connected faster. There is a way to keep that optimization while falling back to next urls, but it would require all exceptions to contain url that caused it so that `HttpDownloader` could retry download on other urls. I think making `HttpDownloader` more reliable and actually using provided `urls` as fallback is much more important than mentioned optimization.

Closes #9015.

PiperOrigin-RevId: 261702678
@philwo philwo added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Jun 15, 2020
@philwo philwo removed the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants