Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Disabling of Connection Pool #448

Merged
merged 4 commits into from
Jan 30, 2019
Merged

Fix Disabling of Connection Pool #448

merged 4 commits into from
Jan 30, 2019

Conversation

tisba
Copy link
Contributor

@tisba tisba commented Jan 29, 2019

Fixes #438

The README currently states, that:

SSHKit::Backend::Netssh.pool.idle_timeout = 0

will disable the cache and this can be used to mitigate potential problems from connection pooling. As described in #438 quite the opposite was the case. Connections were not cached, BUT the cache eviction loop was still started. With idle_timeout = 0 the the eviction loop becomes a busy loop hogging one core at 100%.

This PR will skip the launch of the eviction loop (via run_eviction_loop) in case the cache is disabled. If the cache is disabled, there can't be any timed_out_connections because silently_close_connection_later is not used.

find_cache does not use thread_safe_find_or_create_cache if cache is disabled:

# Look up a Cache that matches the given connection arguments.
def find_cache(args)
if cache_enabled?
key = cache_key_for_connection_args(args)
caches[key] || thread_safe_find_or_create_cache(key)
else
NilCache.new(method(:silently_close_connection))
end
end

I'd like to get some guidance on how to test this (@leehambley maybe?).

Further Suggestion

I also would like to propose that there is a maximum eviction frequency. The idle_timeout should not guarantee when idle connections are removed. Rather it should be considered "after idle_timeout sshkit will evict the connection eventually; never before". If you agree with this, we could simplify the sleep from this:

# Loops indefinitely to close connections and to find abandoned connections
# that need to be closed.
def run_eviction_loop
loop do
process_deferred_close
# Periodically sweep all Caches to evict stale connections
sleep([idle_timeout, 5].min)
caches.values.each(&:evict)
end
end

to

sleep 5

@capistrano-bot
Copy link

capistrano-bot commented Jan 29, 2019

1 Warning
⚠️ There are code changes, but no corresponding tests. Please include tests if this PR introduces any modifications in behavior.

Generated by 🚫 Danger

@leehambley
Copy link
Member

@tisba thanks for the submission, this seems like a great fix. It's dirt simple, and the explanation in the PR is longer than the code sample, which inspires confidence.

I've requested a review from @mattbrictson if he's OK with it, let's roll it and get it out in the next release!

@tisba
Copy link
Contributor Author

tisba commented Jan 29, 2019

Nice!

Do you want to me to update the changelog, as the nice @capistrano-bot mentioned? :)

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

@leehambley
Copy link
Member

Do you want to me to update the changelog, as the nice @capistrano-bot mentioned? :)

Probably a good idea!

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

That concerns me somewhat more than this, lets keep them separate? @mattbrictson what do you think?

CHANGELOG.md Show resolved Hide resolved
@mattbrictson
Copy link
Member

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

I think it is safe to include. Just to clarify, we're talking about changing this line:

 sleep([idle_timeout, 5].min)

to be:

sleep(5)

correct?

There are code changes, but no corresponding tests. Please include tests if this PR introduces any modifications in behavior.

I am OK with accepting the PR without tests.

Thanks so much for this PR! 🙏

@tisba
Copy link
Contributor Author

tisba commented Jan 30, 2019

This is exactly the change I would like to suggest.

With 162cbfd the eviction of timed out connection could obviously take a couple of seconds, but (like I already outlined) the semantics of "will eventually be closed/removed" is not something that users of sshkit should be surprised by.

@leehambley leehambley merged commit 15251f1 into capistrano:master Jan 30, 2019
@leehambley
Copy link
Member

Danke für den Einsatz!

Thanks for your contribution, may your SideKiq workers spin less frequently on a busy loop! 🏆

@tisba tisba deleted the gh-438 branch January 30, 2019 09:24
@mattbrictson
Copy link
Member

We generally publish new SSHKit releases to rubygems every six weeks, and since the most recent release was just a few days ago, that would mean this fix doesn't get published until about 5 weeks from now. But I'm happy to make exceptions for important bug fixes. Does this qualify?

@tisba
Copy link
Contributor Author

tisba commented Jan 31, 2019

IMO this is a nasty bug, even though sshkit users might run into it only extreme rare cases. What makes it "worse" IMO, is that the current README (regarding the ability to disable the connection cache) directly leads to the issue fixed by this PR.

On the other hand the mitigation (once you've figured out what is going on), is rather trivial: Don't disable the connection pool.

Me personally I'm okay with not rushing this.

@mattbrictson
Copy link
Member

Actually, I'll make a release today. These sorts of major bug fixes don't happen very often.

@mattbrictson
Copy link
Member

🚀 released as 1.18.2

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 18, 2019
Update ruby-sshkit package to 1.20.0.


## [1.20.0][] (2019-08-03)

  * [#468](capistrano/sshkit#468): Make `upload!` take a `:verbosity` option like `exec` does - [@grosser](https://github.com/grosser)

## [1.19.1][] (2019-07-02)

  * [#465](capistrano/sshkit#456): Fix a regression in 1.19.0 that prevented `~` from being used in Capistrano paths, e.g. `:deploy_to`, etc. - [@grosser](https://github.com/grosser)

## [1.19.0][] (2019-06-30)

  * [#455](capistrano/sshkit#455): Ensure UUID of commands are stable in logging - [@lazyatom](https://github.com/lazyatom)
  * [#453](capistrano/sshkit#453): `as` and `within` now properly escape their user/group/path arguments, and the command nested within an `as` block is now properly escaped before passing to `sh -c`. In the unlikely case that you were manually escaping commands passed to SSHKit as a workaround, you will no longer need to do this. See [#458](capistrano/sshkit#458) for examples of what has been fixed. - [@grosser](https://github.com/grosser)
  * [#460](capistrano/sshkit#460): Handle IPv6 addresses without port - [@will-in-wi](https://github.com/will-in-wi)

## [1.18.2][] (2019-02-03)

  * [#448](capistrano/sshkit#448): Fix misbehaving connection eviction loop when disabling connection pooling - [Sebastian Cohnen](https://github.com/tisba)

## [1.18.1][] (2019-01-26)

  * [#447](capistrano/sshkit#447): Fix broken thread safety by widening critical section - [Takumasa Ochi](https://github.com/aeroastro)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConnectionPool: idle_timeout of 0 causes 100% and busy looping
4 participants