-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#40] Allow SSHKit.run executing on hosts in parallel #131
base: main
Are you sure you want to change the base?
Conversation
Running 'sleep 2' cmd in parallel (to 2 connections) and let's call it a pass when it returns its run time less than 4 sec. Sequential runs more than 4 sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seungjin thank you so much for your contribution 🙂 👍
I've had a little time to take a closer look and added a few comments.
If you could add the corresponding changelog entries and documentation that'd be amazing.
We're more than happy to assist. If you need anything, please don't hesitate to let us know.
Cheers
assert String.trim(stdout(output1)) == host.options[:user] | ||
assert String.trim(stdout(output2)) == host.options[:user] | ||
|
||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 For the test, I would rather start 2 hosts. I would expect this to be the more common use case for the :parallel
feature.
You can start another host by adding another config entry to the :boot
tag. Maybe with a different user, so the output is actually different.
A small note on formatting:
- Please remove the blank line (line 33)
- Throughout the code base we use spaces after the comma between list items (on line 23, seems like Credo does not catch that)
Thank you 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I would like to avoid relying on time in the tests since that may fail for a bunch of reasons and make the test suite flaky. I wonder if this could be tested in a different way? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tessi do you have an idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or @klappradla?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need two tests :)
- to test parallel execution on two different hosts (as @pmeinhardt said) which is not required to test the actualy parallelism but instead focuses on correct execution (so we see when things break for real users in a real'ish scenario).
- test parallel execution on the same host, in this time also testing that things really run in parallel.
For 2. I agree that the time approach might* work -- but it makes our test suite at least 2 seconds slower. Instead I would suggest to test a side effect on that host which can only happen when two parallel ssh sessions happen. The question is which side effect to test. Possible options are:
- monitor currently active ssh shells. The command could monitor active ssh sessions and wait until two sessions are seen (maybe with a timeout to prevent lifelock).
- kinda crappy: a command could append numbers from 1 to 1.000.000 into a file. If both ssh sessions do that the file content would contain "unsorted" numbers
- if the two commands could be different: one could open a socket/file and read/wait on it. the other could write to that socket/file.
I personally prefer the first option, as it's the least crappy one :)
It seems the who
command lists currently logged in users
The who utility displays a list of all users currently logged on, showing for each user the login name, tty name, the date and time of login, and hostname if not local.
We could write (and then execute) a little bash script which runs who | wc -l
until two users are logged in. That bash script could be a while-loop wrapped in the timeout
command.
*) might because when tests are run on CI timing is always an issue. Maybe some tests needs longer than others and result in flaky tests. Things will always break in unexpected ways :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and again (because I forgot to write this before), thanks @seungjin for your contribution 💚 I would love this feature to go in.
Shouldn't such kind of a parallelization be a client consideration? |
Hi @lessless, thanks a lot for bringing some life to this PR again 🌱💧 I think this is a very valid question that we have also asked ourselves. On the one hand, it's of course rather simple to achieve parallel execution even if the library does not provide it. On the other hand, it looks like such a common use case that it seems nice if we can avoid everyone having to re-roll their own copy of parallel task execution. We ourselves usually deploy to at least two machines for load-balancing and I assume there's really a lot of people out there who'll have similar needs. I personally think this (optimizing for the common use case) is a good reason for having it in the library. That said, we're super happy to hear and consider different opinions anytime though so please don't hesitate to share your thoughts! 💚 |
At the same time, sorry @seungjin for the long silence 🙇 I had another look at how we could verify that connections are established in parallel. Here's a thought:
Below is a rough outline visualizing what this looks like: The left pane is running the SSH server. The top-right, just sets up a test user. Both of these steps are taken care of by the Are you by any chance still up for working on this? Again, sorry for the long radio silence and thanks a lot for your contribution ❤️ |
In case you guys will decide to move on with this one concern is to properly propagate is exit codes and associated outputs in case of error back to the caller.
I’m not sure if I’m not being late here because I didn’t look at the code yet - we are just exploring bout options with SSHKit for now.
One of the drawbacks of doing things in parallel is more involving retry and recovery logic.
Can you please shed some lights on how are dealing with that or what available options will be there on a table after merging this PR?
Many thanks!
… On 21 Apr 2019, at 21:27, Paul Meinhardt ***@***.***> wrote:
At the same time, sorry @seungjin for the long silence 🙇
I had another look at how we could verify that connections are established in parallel. Here's a thought:
Start 2 different sshd Docker containers via the @boot tag
Run sth. like sleep 5 on both of these hosts via SSH using :parallel mode (the command should just keep the connection open long enough. but we'll kill it as soon as we detect success)
In parallel, run docker top [CONTAINER] for both hosts until we see a connection on both of them at the same time
Once we do see a session on both hosts, kill the containers, (prematurely) terminating the sleep so we're not wasting any time.
Below is a rough outline visualizing what this looks like:
The left pane is running the SSH server. The top-right, just sets up a test user. Both of these steps are taken care of by the @boot tag. The pane in the middle was used to open an SSH connection. The bottom-right pane shows the docker top output with the session active first, then without the SSH connection. We'd basically be looking for something like the first output on two containers "at the same time" to have a successful test.
Are you by any chance still up for working on this?
Again, sorry for the long radio silence and thanks a lot for your contribution ❤️
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hey @lessless, there's currently no plan to integrate retry into the package – it seems to me like this would be highly application/use-case dependent. Any If you have additional questions, maybe we should move these into a separate issue. I'd definitely be happy to hear more about your use case. We're still working on improving SSHKit's interface and design so this kind of feedback is incredibly helpful ✌️ |
@pmeinhardt it will be amazing if we an bounce off few ideas but I'm not sure if it's a right thing to open a new issue for it. My use case in a nutshell is parallel
Here is more detailed description of the idea https://elixirforum.com/t/call-for-feedback-multi-node-deployment-tool-edeliver-2-0-design/21747. It'll be great if you can take look at it and let us know how we can utilize SSHKit to implement that design or what are possible riffs and obstacles to avoid. I'm also @lessless on Elixir Slack if anything. Cheers! |
Allow SSHKit.run executing on hosts in parallel
Description
SSHKit.run can do its job in sequential and parallel.
SSHKit.run(context, cmd, :sequential, 1000) runs cmd sequentially with its timeout 1000ms.
SSHKit.run(context, cmd, :parallel, 1000) runs cmd sequentially with its each cmd's timeout 1000ms.
Motivation and Context
Please refer to #40
Types of changes
Checklist
## master
section).