Input file with list of commands to run async on nodes #75

dgrisham · 2018-09-05T23:29:37Z

I have an application where I want to easily run commands asynchronously on nodes. While the new async update to iptb run is awesome and covers some of my use cases, there are others where I want to asynchronously run different commands on nodes. This adds that functionality.

Usage: Pass --cmdFile <cmds> to iptb run, where <cmds> is a file where each line is of the form <nodeRange>: <cmd> -- <cmd> is the command to run on all of the nodes in <nodeRange>. Each <nodeRange> should have the same format as that accepted by commands.parseRange().

Example <cmds> file:

0: echo hi
[1-3]: echo bye
[4-5]: echo hello

It is not necessary that all nodes are account for, and nodes can also appear more than once. For example, if there are 3 nodes, we can do:

0: echo hi
[0,2]: echo bye

I'm curious what others' thoughts are on the utility of this feature, and the implementation.

travisperson · 2018-09-06T01:56:17Z

I want to asynchronously run different commands on nodes

I can see a use case for this, or at least can see that trying to accomplish this might be difficult from the cli or a shell script.

This is sort of the same idea as the topology idea in #65 / #76. The issue here though is that you are looking for all commands to start executing at or around the same time.

I think we can solve two problems here if we make it a bit more generic in our implementation, that would allow for it to also be used in the similar way to connect nodes.

A generic way to approach this would be to make every line of the file be the arguments to the command, after flags.

run

0 -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
[4-5] -- echo "Running on node 4,5"
-- echo "Running on all nodes"

connect

0 1
[2-4] [5-8]

The basic implementation to use across supported commands would be to treat everything like it was coming from a file, but the single use on the cli, would just be a file with a single line. Only the arguments section of the command would be contained in the file. No flags could be passed on a per line basis.

What are your thoughts @dgrisham?

/cc @davinci26

dgrisham · 2018-09-06T05:36:09Z

@travisperson Yeah, that all sounds great to me. I did feel like my implementation could be a bit cleaner in its connection with the CLI/existing implementation, and I think your suggestion addresses that. I can pull together an implementation of this and we can move forward from there?

dgrisham · 2018-09-06T05:37:57Z

Also -- are you thinking of having the same kind of interface as I initially suggested? Specifically having a --cmdFile flag for iptb run (and, I suppose, a --topology flag for iptb connect)?

dgrisham · 2018-09-06T07:22:14Z

That last commit reflects the approach you suggested, @travisperson.

davinci26 · 2018-09-06T14:28:21Z

I really like the idea in general, I think it is nice superset of #76 . I would really like to see comment lines starting with # or // which is not a big overhead.

In terms of weight, if both parsers are implemented correctly and tested thoroughly then since they are an overlay to IPTB I think they will require minimal maintanance.

travisperson · 2018-09-06T22:27:51Z

I would really like to see comment lines starting with # or // which is not a big overhead.

I think we can support that. I would be an advocate for #.

Specifically having a --cmdFile flag for iptb run (and, I suppose, a --topology flag for iptb connect)?

I'm wondering what your thoughts on using stdin might be? I like the approach of using stdin because it ends up being really clean for iptb when used within a shell script (common for testing), but also supports passing a file when needed easily.

iptb run <<CMD
0     -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
CMD

iptb run < cmds.txt

Regardless I would like both run and connect to support the same usage because they both will be collecting a list of arguments to run. I think this pattern might be expanded further to other commands as well and it will be nice to keep them all similar.

We do loose the stdin for any future use, but at the moment I don't think there is a need for it. It would be useful for iptb run to pass a stdin to the command being ran on the node. However, that is tricky in itself as we'd need to buffer the entire input to pass along to multiple calls to RunCmd.

Also, with this PR, we'd want to be able to control stdin differently as you'd want to be able to supply the stdin for each line. Something we can look at down the road I think.

dgrisham · 2018-09-07T03:04:45Z

@travisperson

I'm wondering what your thoughts on using stdin might be?

I love the idea -- cleans up the interface even more and seems like a very natural approach (moreso than --cmdFile). I'm taking a stab at the implementation now, it's not immediately obvious to me what the best way to handle this code-wise is but that's what code reviews are for :)

dgrisham · 2018-09-07T03:30:05Z

Attempt made, seems to work well as far as I can tell.

dgrisham · 2018-09-07T04:48:57Z

Note on implementation: iptb run will wait for STDIN input if ran with no args and STDIN is initially empty. I think that's because I used a bufio.Scanner instead of bufio.Reader to read STDIN. Is this acceptable behavior? iptb run --help can still access the usage info, but it's overall inconsistent with how other commands response in similar cases. I think we can use bufio.Reader to check whether there's buffered input, and if not we can print the usage, which seems better to me.

Edit: Hm, after playing around with bufio.Reader it seems like that might not actually work here.

travisperson · 2018-09-07T05:22:34Z

I think we can use bufio.Reader to check whether there's buffered input, and if not we can print the usage, which seems better to me.

Ya, there are a few ways to check if stdin has data. It's a bit tricky though to do it for all platforms nicely. There are some packages that handle it all (https://github.com/mattn/go-isatty). I want to go find how the go-ipfs project does it though it would be good to do it the same way.

wait for STDIN input if ran with no args

I think this is okay for now, works well enough for this branch at least at the moment till we can figure out how go-ipfs does it.

Have you been able to run an example? I tried and was not getting all the output I expected. I think we will need to do some scope management to make sure everything executes correctly.

dgrisham · 2018-09-07T05:35:58Z

Have you been able to run an example?

Yeah, it works in the form iptb run <cmds to me. And the example you posted before also works:

$ iptb run <<CMD
0     -- echo "Running on node 0"
[1-3] -- echo "Running on node 1,2,3"
CMD
node[0] exit 0

"Running on node 0"

node[1] exit 0

"Running on node 1,2,3"

node[2] exit 0

"Running on node 1,2,3"

node[3] exit 0

"Running on node 1,2,3"

dgrisham · 2018-09-07T06:00:21Z

Also, because I don't want to forget -- it's worth considering how commands.mapListWithOutput() is implemented. It wraps commands.mapWithOutput(), which means that it's hierarchical. So, for example, if you run:

iptb run <<CMD
[0,1] -- echo "Running on nodes 0, 1"
[2, 3] -- echo "Running on nodes 2, 3"
CMD

then mapListWithOutput() will start a goroutine that runs mapWithOutput() on [0,1], then another that runs on [2,3]. Each of those goroutines will then spawn two more goroutines, one for each of the nodes they're responsible for, that actually run the commands on each node. I think this is probably fine, but wanted to bring it up in case another approach would be preferred -- e.g. having mapListWithOutput() spawn the goroutines for all of the nodes directly that run the commands.

travisperson · 2018-09-07T06:28:20Z

Ah thanks, I figured out what I was doing wrong.

travisperson · 2018-09-07T06:54:53Z

commands/run.go

+				cArgsStr[i] = arg
+			}
+			args = append(args, cArgsStr)
+			terminatorPresent = append(terminatorPresent, c.IsSet("terminator"))


Something I think we can do here to make this simpler is to normalize everything to an io.Reader that we can pass to bufio.NewScanner regardless of using stdin, or passing through arguments.

So for the non-stdin case, we construct an io.Reader using strings.NewReader, we can then just hand it to bufio.NewScanner and let it be handled the exact same way, allowing us to move that scanner logic out, and into the main execution path. This also helps clean up the some of the terminator logic, as we can just assume everything will have it present (we can inject it back in on the args case).

We only need the terminator flag for the cli args case, as the cli library will swallow it, in the stdin case, the flag terminator is always present (if the user supplied it, if they didn't it's an error anyways).

Hm, I like the abstraction but to make it happen I think we would have to reconstruct the input argument into a single string in order to pass it to strings.NewReader in the non-stdin case. This means that cli would break up the string for us, we'd put it back together, then take it apart again once we processes the command. Seems a bit roundabout, but I'm not sure how else to make that happen since we don't seem to have access to the original string once we're in the Action function.

Also, is it necessarily the case that not passing -- should result in an error? Seems like the following could be supported:

iptb run <<CMD 0 "Running on node 0" echo "Running on all nodes" CMD

Depends on what we want here, though.

Edit Scratch that, I just realized that iptb run echo "Running on all nodes" was never supported in the first place.

This means that cli would break up the string for us, we'd put it back together, then take it apart again once we processes the command. Seems a bit roundabout

Ya, it does feel a bit weird, but it does provide a really simple abstraction. We don't need to do it though, as long as we think the approach we take seems simple and will be fairly straight forward to reuse.

Also, is it necessarily the case that not passing -- should result in an error?

It would be nice to make any line of the cmd file be valid arguments to the command itself. It makes it easy to copy a single line, and does not require anything special if a user decided they want to quickly tests running each line of the cmd as a separate iptb run by reading each line on their own and applying it to iptb run.

I think the abstraction is worth it in this case, working on it now.

And re:

It would be nice to make any line of the cmd file be valid arguments to the command itself.

Agreed with that, I was going for the same but was forgetting how iptb run worked.

dgrisham · 2018-09-07T21:22:41Z

@travisperson I still want to discuss the reader abstraction point you mentioned, just cleaned up the terminator logic for now.

dgrisham · 2018-09-09T04:03:49Z

commands/run.go

+		var args [][]string
+		scanner := bufio.NewScanner(reader)
+		for scanner.Scan() {
+			tokens := strings.Fields(scanner.Text())


This is actually bad idea -- for example, when you run iptb run 0 -- sh -c 'echo hello >file', I think the 'echo hello >file' portion needs to be passed as a single argument to the this exec function. However, the strings.Fields() function splits on spaces, which means 'echo, hello, and >file' get passed separately. Because of this, commands like that end up failing. I'm not quite sure how to handle this at the moment.

cc'ing @travisperson for thoughts

We should be able to use something like this: https://github.com/kballard/go-shellquote

👍 ended up going with https://github.com/mattn/go-shellwords instead (based on code/tests), same idea though.

dgrisham · 2018-09-09T07:48:49Z

Something I want to note here because it's kind of obscured in the outdated review comments now: the current version of this introduces a new dependency, https://github.com/mattn/go-shellwords. I'm not sure what the protocol is for doing something like that. There aren't a lot of packages dedicated to parsing shell strings -- the other one we found was https://github.com/kballard/go-shellquote. I chose the former package because of the number of tests it had, and for the coding style.

travisperson · 2018-09-11T02:06:44Z

I chose the former package because of the number of tests it had, and for the coding style.

👍

the current version of this introduces a new dependency, https://github.com/mattn/go-shellwords. I'm not sure what the protocol is for doing something like that.

It would be nice to have it gxed. This basically just takes forking and adding a package.json.

Also with regard to an earlier discussion, checking if stdin has data, there is a gxed package: https://github.com/gxed/go-isatty

dgrisham · 2018-09-11T07:42:46Z

@travisperson I gx'ed the package ~~here~~ (moved to a different repo, see edit below) and added it to our package.json.

I also found a way to check for data on stdin in a pretty straightforward way, just using Go's os package. The current implementation allows for each of these usages:

$ iptb run 0 -- echo hi
$ iptb run <<CMD
0 -- echo hi
CMD
$ echo '0 -- echo hi' | iptb run

but gives an error if the input is just:

$ iptb run
error: no command input and stdin is empty

It basically just checks if there's something to read on stdin, or if there's an open pipe to stdin. If neither are the case, it prints an error.

Edit: Gxed packaged is now at https://github.com/gxed/go-shellwords.

Edit 2: The error output is now:

$ iptb run
NAME:
   iptb run - run command on specified nodes (or all)

USAGE:
   iptb run [nodes] -- <command...>

CATEGORY:
   CORE

error: no command input and stdin is empty

commands/run.go

Usage: Pass `--cmdFile <cmds>` to `iptb run`, where `<cmds>` is a file where each line is of the form `<nodeRange>: <cmd>` -- `<cmd>` is the command to run on all of the nodes in `<nodeRange>`. The format of `<nodeRange>` should be anything accepted by the `commands.parseRange()` function. License: MIT Signed-off-by: David Grisham <[email protected]> rebase me

License: MIT Signed-off-by: David Grisham <[email protected]>

Signed-off-by: David Grisham <[email protected]> License: MIT

License: MIT Signed-off-by: David Grisham <[email protected]>

Also mattn/go-shellwords -> gxed/go-shellwords. License: MIT Signed-off-by: David Grisham <[email protected]>

License: MIT Signed-off-by: David Grisham <[email protected]>

dgrisham · 2018-09-12T22:34:55Z

@travisperson I think I'm happy with this in its current state. If you have a chance/haven't already, maybe make sure you're happy with the mapListWithOutput function I introduced and let me know if there's anything else you see that you think should be changed. Once we verify that we're happy with how this works, I'll update the documentation to elaborate a bit on how this works.

travisperson

Just some race concerns. Otherwise I think this is good to go. When we extend it to some other commands we can possibility look at breaking it up some more. No need to go down that route yet though I think.

travisperson · 2018-09-12T23:27:09Z

commands/utils.go

+		wg.Add(1)
+		go func() {
+			defer wg.Done()
+			results_i, err := mapWithOutput(list, nodes, fns[i])


I'm pretty sure we want to pass list and i in as parameters. I'm pretty sure this is a race issue. On small workloads it probably won't show up, but when things actually start to get schedule the value of i or list may have changed.

Also something we need to make clear in our plugin documentation (none existed at the moment) is that they need to be thread safe, as now with this format the same node could be access at the same time.

Ah, right, this is my first time using goroutines and I'm still getting used to stuff like this 👍

And gotcha on the documentation, I can throw that in when I do this rest of the documentation as well.

Oh, just realized you said that there isn't any plugin documentation at all. I'll at least make an issue to remind us.

License: MIT Signed-off-by: David Grisham <[email protected]>

dgrisham · 2018-09-13T17:53:11Z

@travisperson Race condition fixed, and documentation added.

travisperson

LGTM

dgrisham · 2018-09-14T20:23:02Z

@travisperson Hey, just caught an inconsistency -- we decided to move the check for stdin/pipe input from Action to Before (here). This means that running only iptb run prints the usage/description/etc. and, after all of that, the error output.

By contrast, iptb shell just prints the error itself, and not usage/etc.:

$ iptb shell
Usage Error: shell takes exactly 1 argument

These seem like they should behave the same, do you have a preference as to which one should change?

Edit: More info, iptb testbed and iptb attr print out usage/etc., while iptb metric just prints out an error like iptb shell. I'll have iptb run mirror testbed and attr, and we can discuss standardizing this in a different issue.

travisperson · 2018-09-14T20:52:32Z

More info, iptb testbed and iptb attr print out usage/etc., while iptb metric just prints out an error like iptb shell

Anything that is printing usage is due to having sub commands.

There is some commented out code in main.go that handles the inconsistency by supporting a UsageError error but its dependent on some code which we haven't updated too yet, and it also has a bug at the moment.

More Info

dgrisham · 2018-09-14T20:58:54Z

Ah, okay, I see. I guess that means I shouldn't have wrapped my error in NewUsageError, because when this change is applied it would end up printing the iptb run command help twice. Thanks for the explanation!

dgrisham requested a review from travisperson September 5, 2018 23:29

dgrisham force-pushed the feat/async-run-list branch from 84a1be4 to f50abc1 Compare September 6, 2018 07:33

This was referenced Sep 6, 2018

Connection Topology #76

Open

Collecting statistics #77

Open

dgrisham force-pushed the feat/async-run-list branch 2 times, most recently from c61747a to 019ae29 Compare September 7, 2018 06:04

travisperson reviewed Sep 7, 2018

View reviewed changes

dgrisham commented Sep 9, 2018

View reviewed changes

davinci26 mentioned this pull request Sep 11, 2018

Connect nodes in parallel #67

Open

dgrisham force-pushed the feat/async-run-list branch from bbc0f8e to c17d30d Compare September 11, 2018 19:38

travisperson reviewed Sep 11, 2018

View reviewed changes

commands/run.go Outdated Show resolved Hide resolved

dgrisham added 2 commits September 12, 2018 16:28

Reworking implementation to be more generalizable.

a5efdd6

License: MIT Signed-off-by: David Grisham <[email protected]>

dgrisham added 10 commits September 12, 2018 16:28

Read from stdin instead of passing --cmdFile.

6e521b4

License: MIT Signed-off-by: David Grisham <[email protected]>

Clean up terminator logic.

f3171f1

License: MIT Signed-off-by: David Grisham <[email protected]>

Abstract out bufio.Reader from the rest of the logic.

5d3b816

License: MIT Signed-off-by: David Grisham <[email protected]>

Support # at beginning of line for commenting.

9c94eaa

Signed-off-by: David Grisham <[email protected]> License: MIT

Use mattn/go-shellwords for parsing STDIN.

2e4a6e9

License: MIT Signed-off-by: David Grisham <[email protected]>

Gx'ify mattn/go-shellwords.

6a613e0

License: MIT Signed-off-by: David Grisham <[email protected]>

Check STDIN for data or named pipe on iptb run. If both empty, exit.

2a39bcc

Also mattn/go-shellwords -> gxed/go-shellwords. License: MIT Signed-off-by: David Grisham <[email protected]>

Print iptb run usage when no command is input.

cf4a66f

License: MIT Signed-off-by: David Grisham <[email protected]>

Check stdin in Before instead of Action.

611c391

License: MIT Signed-off-by: David Grisham <[email protected]>

Resolve TODO.

bfeda42

License: MIT Signed-off-by: David Grisham <[email protected]>

dgrisham force-pushed the feat/async-run-list branch from 46439dc to bfeda42 Compare September 12, 2018 22:30

travisperson suggested changes Sep 12, 2018

View reviewed changes

dgrisham added 2 commits September 12, 2018 21:25

Fix goroutine iterator race condition.

e2ba390

License: MIT Signed-off-by: David Grisham <[email protected]>

Docs: Add comprehensive description to iptb run command.

01a61ca

License: MIT Signed-off-by: David Grisham <[email protected]>

travisperson approved these changes Sep 14, 2018

View reviewed changes

Fix: Small typo in error output.

d4685c2

dgrisham force-pushed the feat/async-run-list branch from f36dad8 to d4685c2 Compare September 14, 2018 21:01

dgrisham merged commit 7928974 into ipfs:master Sep 14, 2018

dgrisham deleted the feat/async-run-list branch September 14, 2018 21:04

Input file with list of commands to run async on nodes #75

Input file with list of commands to run async on nodes #75

Conversation

dgrisham commented Sep 5, 2018

travisperson commented Sep 6, 2018

dgrisham commented Sep 6, 2018

dgrisham commented Sep 6, 2018

dgrisham commented Sep 6, 2018

davinci26 commented Sep 6, 2018

travisperson commented Sep 6, 2018

dgrisham commented Sep 7, 2018

dgrisham commented Sep 7, 2018

dgrisham commented Sep 7, 2018 • edited Loading

travisperson commented Sep 7, 2018

dgrisham commented Sep 7, 2018

dgrisham commented Sep 7, 2018

travisperson commented Sep 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrisham Sep 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrisham commented Sep 7, 2018

dgrisham Sep 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrisham commented Sep 9, 2018 • edited Loading

travisperson commented Sep 11, 2018

dgrisham commented Sep 11, 2018 • edited Loading

dgrisham commented Sep 12, 2018

travisperson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgrisham commented Sep 13, 2018

travisperson left a comment

Choose a reason for hiding this comment

dgrisham commented Sep 14, 2018 • edited Loading

travisperson commented Sep 14, 2018

dgrisham commented Sep 14, 2018 • edited Loading

dgrisham commented Sep 7, 2018 •

edited

Loading

dgrisham Sep 7, 2018 •

edited

Loading

dgrisham Sep 9, 2018 •

edited

Loading

dgrisham commented Sep 9, 2018 •

edited

Loading

dgrisham commented Sep 11, 2018 •

edited

Loading

dgrisham commented Sep 14, 2018 •

edited

Loading

dgrisham commented Sep 14, 2018 •

edited

Loading