Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] brew search freezes on some GitHub API responses #2466

Closed
wants to merge 3 commits into from
Closed

[WIP] brew search freezes on some GitHub API responses #2466

wants to merge 3 commits into from

Conversation

claui
Copy link
Contributor

@claui claui commented Apr 8, 2017

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew tests with your changes locally?

Note: work in progress – DO NOT MERGE

Update: Fixed the curl size report.

Update 2: Updated next steps; also, added instructions to use HOMEBREW_NO_GITHUB_API to temporarily work around the issue.

Update 3: Clarified description; added two tasks (reuse parts of PR #21665; review commits)


The issue

Since yesterday (7th of April 2017), brew search freezes indefinitely on my system:

$ brew search hello
hello

The issue is reproducible on my system and happens 100 % of the time. It also affects brew install when I misspell a formula, which also triggers a search.

The freeze seems to be unrecoverable; for example, I’ve left it for over three hours as of right now, with no noticeable progress.

There is some evidence that the freeze specifically occurs while brew search queries the GitHub API to enumerate all the formulae and casks. Notably, whenever it gets to the caskroom/cask tap, the size of the response grows to more than 1 MiB:

/usr/bin/curl --remote-time --location \
  --user-agent \
    'Homebrew/1.1.12-24-gd84655efb9 (Macintosh; Intel macOS 10.11.6) curl/7.43.0' \
  'https://api.github.com/repos/Caskroom/homebrew-cask/git/trees/HEAD?recursive=1' \
  --header 'application/vnd.github.v3+json' \
  --write-out $'\n''%{http_code}' \
  -s \
  | tee /dev/fd/3 3> >(wc -c) | tail -1 \
  | env LC_ALL=en_US.UTF-8 xargs printf \
    "=> HTTP status: %d\n=> Response size: %'.f bytes\n"

=> HTTP status: 200
=> Response size: 1,082,773 bytes

My environment

brew tap

The output of brew tap is:

bsiegel/tap
caskroom/cask
caskroom/versions
claui/aged
claui/cask-unofficial
claui/cask-versions
claui/core
claui/test
ethereum/ethereum
homebrew/boneyard
homebrew/caskroom
homebrew/core
homebrew/dev-tools
homebrew/dupes
homebrew/games
homebrew/gui
homebrew/python
homebrew/science
homebrew/services
homebrew/x11
osx-cross/avr
rosaland/rosa
vitorgalvao/tiny-scripts
watsonbox/cmu-sphinx

brew doctor

This is what brew doctor says:

Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry and just ignore them. Thanks!

Warning: You have unlinked kegs in your Cellar
Leaving kegs unlinked can lead to build-trouble and cause brews that depend on
those kegs to fail to run properly once built. Run `brew link` on these:
  bugit

brew config

The output of brew config is:

HOMEBREW_VERSION: 1.1.12-24-gd84655efb9
ORIGIN: https://github.com/Homebrew/brew.git
HEAD: d84655efb9dba5840d75e08da5b69f5fd0c56744
Last commit: 3 days ago
Core tap ORIGIN: https://github.com/Homebrew/homebrew-core
Core tap HEAD: c57980958c9c81bbe7a5313cb8150d8748c2d375
Core tap last commit: 46 minutes ago
HOMEBREW_PREFIX: /usr/local
HOMEBREW_REPOSITORY: /usr/local/Homebrew
HOMEBREW_CELLAR: /usr/local/Cellar
HOMEBREW_BOTTLE_DOMAIN: https://homebrew.bintray.com
CPU: octa-core 64-bit ivybridge
Homebrew Ruby: 2.0.0-p648
GCC-4.2: build 5666
Clang: 8.0 build 800
Git: 2.11.0 => /usr/local/bin/git
Perl: /usr/bin/perl
Python: /usr/local/bin/python => /usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/bin/python2.7
Ruby: /Users/claudia/.rbenv/shims/ruby => /usr/local/var/rbenv/versions/2.2.5/bin/ruby
Java: 1.8.0_121, 1.7.0_80
macOS: 10.11.6-x86_64
Xcode: 8.2.1
CLT: 8.2.0.0.1.1480973914
X11: 2.7.11 => /opt/X11

Other environment variables

Lastly, here are some of the environment variables my shell exports to Homebrew:

GITHUB_TOKEN="[redacted]"
GITPERLLIB="/usr/local/lib/perl5/site_perl"
GIT_EDITOR="/usr/local/bin/mate -w"
HOMEBREW_DEVELOPER="1"
HOMEBREW_GITHUB_API_TOKEN="[redacted]" # same value as GITHUB_TOKEN
HOMEBREW_NO_ANALYTICS="1"
LANG="en_US.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
PATH="/usr/local/opt/jenv/shims:/usr/local/sbin:/Users/claudia/.rbenv/shims:/Users/claudia/.rbenv/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Users/claudia/.gem/ruby/2.2.0/bin:/Users/claudia/Documents/dev/system:/Users/claudia/Documents/dev/system/git-hooks:/Users/claudia/Documents/dev/system/jpeg:/usr/local/opt/go/libexec/bin:/Users/claudia/Documents/dev/go/bin"
RBENV_SHELL="bash"
SHELL="/bin/bash"
TMPDIR="/var/folders/9k/[redacted]/T/"

Analysis

The title bar of my Terminal window shows that the freeze always occurs while curl is the active process.

However, while e. g. brew search hello is stuck, running pgrep -afl brew (in another Terminal window) reveals that the freeze always happens during the one specific GitHub API query which asks for the file tree of the caskroom/cask tap.

This is what the output of pgrep -afl brew says:

70924 /System/Library/Frameworks/Ruby.framework/Versions/Current/usr/bin/ruby -W0 /usr/local/Homebrew/Library/Homebrew/brew.rb search hello
70961 /usr/bin/curl --remote-time --location --user-agent Homebrew/1.1.12-24-gd84655efb9 (Macintosh; Intel macOS 10.11.6) curl/7.43.0 https://api.github.com/repos/Caskroom/homebrew-cask/git/trees/HEAD?recursive=1 --header application/vnd.github.v3+json --write-out 
%{http_code} /usr/bin/curl --remote-time --location --user-agent Homebrew/1.1.12-24-gd84655efb9 (Macintosh; Intel macOS 10.11.6) curl/7.43.0 --progress-bar --fail --header Authorization: token [redacted] --dump-header /tmp/github_api_headers20170408-[redacted]

My network connection seems fine; curl will freeze no matter whether I’m at work or at home.

According to stat, the temporary file which is supposed to receive the API response headers seems stuck at 0 bytes size, and timestamps matching the spawn time of the curl process:

16777220 10223500 -rw------- 1 claudia wheel 0 0 "Apr  8 12:23:22 2017" "Apr  8 11:58:34 2017" "Apr  8 11:58:34 2017" "Apr  8 11:58:34 2017" 4096 0 0 /tmp/github_api_headers20170408-[redacted]

However, whenever I do the same curl command manually, it works perfectly fine:

/usr/bin/curl --remote-time --location \
  --user-agent \
    'Homebrew/1.1.12-24-gd84655efb9 (Macintosh; Intel macOS 10.11.6) curl/7.43.0' \
  'https://api.github.com/repos/Caskroom/homebrew-cask/git/trees/HEAD?recursive=1' \
  --header 'application/vnd.github.v3+json' \
  --write-out $'\n''%{http_code}'

Suspected cause

There’s something fishy about the response I get from the above curl command when I count the lines:

/usr/bin/curl --remote-time --location \
  --user-agent \
    'Homebrew/1.1.12-24-gd84655efb9 (Macintosh; Intel macOS 10.11.6) curl/7.43.0' \
  'https://api.github.com/repos/Caskroom/homebrew-cask/git/trees/HEAD?recursive=1' \
  --header 'application/vnd.github.v3+json' \
  --write-out $'\n''%{http_code}' \
  -s \
  | tee /dev/fd/3 3> >(wc -c) | tail -1 \
  | env LC_ALL=en_US.UTF-8 xargs printf \
    "=> HTTP status: %d\n=> Response size: %'.f bytes\n"

which returns:

=> HTTP status: 200
=> Response size: 1,082,773 bytes

You’ll notice that the length of the API response is just barely over 1 MiB, a number that has a suspicious smell to me. A hint of an I/O-related deadlock?

Last year, we actually had an issue in Homebrew-Cask with similar symptoms. Turned out that the Hbc::SystemCommand class would not handle stdout/stderr streams properly, leading to I/O starvation and eventually to a freeze (which happened as soon as the starved stream grew big enough to fill Ruby’s internal I/O buffers, I suppose).

The issue has since been fixed in Hbc::SystemCommand; however, something similar might contribute to the issue at hand. Who knows.

Known workarounds

  • You can tell Homebrew not to use the GitHub API by running HOMEBREW_NO_GITHUB_API=1 brew search instead of brew search. This temporarily keeps Homebrew from fetching the API response.

Next steps

I plan to continue working on this right away; that’s why I made it a PR, not an issue.
Anyone is free to give it a shot though!

@Homebrew/maintainers Feel free to assign core, do not merge and in progress labels if you want.

  • Figure out where in the code the specific curl call happens

  • Figure out whether the code handles I/O streams properly (see examples for safe I/O handling in Ruby) Update: looks like it doesn’t.

  • Write tests to reproduce the behavior, possibly similar to our existing tests for the Hbc::SystemCommand class

  • Apply a fix similar to the one in Update: Factor out a piece of code from PR #21665 first; that code has been in production for a while so it might make sense to reuse it

  • Apply fix to curl_output, reusing the factored-out code

  • Review commits

@vitorgalvao vitorgalvao added the in progress Maintainers are working on this label Apr 8, 2017
@reitermarkus
Copy link
Member

reitermarkus commented Apr 8, 2017

Last year, we actually had an issue in Homebrew-Cask with similar symptoms.

We also had this issue, which was caused by curl not receiving an EOF: Homebrew/homebrew-cask#23050

However, I cannot reproduce this.

@claui
Copy link
Contributor Author

claui commented Apr 8, 2017

@reitermarkus It may work for most users today but it’s bound to break for everyone eventually.

I have isolated a small test case that can be run from Terminal without Homebrew. If you run the following command line from Bash (doesn’t matter which version of Ruby/MRI you have installed), does it return after 2 seconds, or does it freeze for you?

ruby -ropen3 -e 'Open3.popen3("for i in $(seq 1 2 100000); do echo $i; echo $(($i + 1)) >&2; done") { |_, stdout, stderr, wait_thread| puts "Reading stdout"; stdout.read; puts "Reading stderr"; stderr.read }; puts "Done"'

@vitorgalvao
Copy link
Member

@claui I get only Reading stdout and it stays there.

@ghost
Copy link

ghost commented Apr 8, 2017

freezes saying Reading stdout

@claui
Copy link
Contributor Author

claui commented Apr 8, 2017

@vitorgalvao @monouser7dig Thank you both. This is basically what the method curl_output does in utils/curl.rb.

@claui
Copy link
Contributor Author

claui commented Apr 9, 2017

I have added a spec which now fails reliably. It triggers the bug in curl.rb, then fails if the thread freezes.

I expect this test to fail on every system, including Travis.

@MikeMcQuaid
Copy link
Member

Wow, great work writing a test case for this and your general debugging skills @claui! I need to look at this in more detail but just wanted to give a 👍 before I do so. Thanks again!

@claui
Copy link
Contributor Author

claui commented Apr 9, 2017

Thanks a bunch @MikeMcQuaid! Happy you find it helpful 😊

@MikeMcQuaid
Copy link
Member

@claui This may be addressed by #2540 which means the API call that's returned won't be so ludicrously huge. Can you confirm if this seems to fix things for you? Thanks!

@MikeMcQuaid
Copy link
Member

(to clarify by "fixed" I mean "fixed for brew search"; obviously there's underlying issues with how we handle curl here (most likely specifically curl_output) but if we no longer need to use it in a problematic way: that's good enough for me (although I'd welcome a "proper" fix).

@claui
Copy link
Contributor Author

claui commented Apr 25, 2017

@MikeMcQuaid 👍 for #2540!

That said, the search issue disappeared some time last week from my system. I can’t reproduce it now, even with an earlier commit checked out, e. g. d84655e.
I’m still working on some tests but as of now, my fix is feature complete.

@claui
Copy link
Contributor Author

claui commented Apr 25, 2017

@MikeMcQuaid

if we no longer need to use it in a problematic way: that's good enough for me

Agreed.

(although I'd welcome a "proper" fix).

The most important thing to me it that Homebrew just works – no matter if involuntarily or with a proper (by whose definition?) fix.

That said, curl.rb still remains in use for some core parts (e. g. github.rb and download_strategy.rb). Unexpected changes in a remote resource can produce lots of STDERR output any moment. Having a lurking I/O issue triggered on top of such a situation will not be really helpful for troubleshooting.

@MikeMcQuaid
Copy link
Member

@claui Seems reasonable to me 👍 Will leave this open for you to be able to work on it further. Thanks again!

@MikeMcQuaid MikeMcQuaid removed the in progress Maintainers are working on this label Jun 2, 2017
@stale stale bot added the stale No recent activity label Jun 2, 2017
@stale
Copy link

stale bot commented Jun 2, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@claui
Copy link
Contributor Author

claui commented Jun 8, 2017

I’m actively working on unit tests. While doing so, I have discovered (and smoothed out) a couple of edge cases in the fix.

What do I do to make that bot go away?

@stale stale bot removed the stale No recent activity label Jun 8, 2017
@MikeMcQuaid
Copy link
Member

@claui Your posting a comment was enough. We'd love to see more, smaller PRs if that helps this progress at all. Thanks!

@claui
Copy link
Contributor Author

claui commented Jun 9, 2017

@MikeMcQuaid I have opened a smaller PR as per your suggestion. That makes it easier indeed!
I figure this one should remain WIP until #2762 is merged.

@claui
Copy link
Contributor Author

claui commented Jun 12, 2017

I have now updated this PR with the completed fix.

It builds upon #2762, which is currently under review.

To see the relevant diff, you can just zoom in on the individual commit.

claui added 3 commits June 13, 2017 12:53
This commit sets the stage for an upcoming fix for #2466.
In a nutshell, this is what it does:

- In `cask/system_command.rb`, factor out existing code (“IO
  selection”) that we’re going to need later in order to fix
  PR #2466;

- move said code into its own class `Utils::IOSelector`;

- split the class into smaller methods to make the code more
  expressive than before; and

- add unit tests for `Utils::IOSelector` (they’re a bit bloated
  because edge cases.)
This commit adds a couple of tests for the utility methods in
`utils/curl.rb`, including a few common and edge cases. One of the
edge cases with large stdout/stderr outputs should fail due to an
issue in the `curl_output` method.

Note that this commit also introduces a mock for the `/usr/bin/curl`
binary, which gets injected into the methods under test via the
`HOMEBREW_CURL` environment variable.

References:

- #2466

- https://github.com/Homebrew/brew/blob/d84655efb9dba5840d75e08da5b69f5fd0c56744/Library/Homebrew/utils/curl.rb#L40
This addition to `io_selector.rb` prevents the `curl_output`
method from blocking indefinitely for certain outputs of the curl
binary.

This also fixes the failing test in `curl_test.rb`.

Reference:

- #2466
@stale stale bot added the stale No recent activity label Jul 4, 2017
@stale
Copy link

stale bot commented Jul 4, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot closed this Jul 12, 2017
@lock lock bot added the outdated PR was locked due to age label Jan 18, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age stale No recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants