Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement new run_shell_cmd function with cleaner API to replace run_cmd + run_cmd_qa #4252

Closed
boegel opened this issue May 3, 2023 · 8 comments
Milestone

Comments

@boegel
Copy link
Member

boegel commented May 3, 2023

The current run_cmd and run_cmd_qa functions have a very confusing API, and their implementation is ancient (dates back from a time when we still supported Python 2.4).

We should implement a new run function, which supports running both interactive and non-interactive commands, as well as more advanced features like running a command asynchronously, and providing support to drop into an interactive shell when a command failed.

@boegel
Copy link
Member Author

boegel commented Jun 22, 2023

I had a quick brainstorm on this with @lexming, here's the plan we came up with...

Goals

  • one single run function that supports running (interactive) shell commands (incl. Lmod)
  • sane API + defaults
  • supports everything that current run_cmd and run_cmd_qa functions do
  • enables new features like:
    • context-aware error parsing & reporting
    • wrapper functions for common commands like run_make, run_cmake, etc.
    • opt-in to jumping into an interactive shell session if command failed

Current run_cmd and run_cmd_qa functions

def run_cmd(cmd, log_ok=True, log_all=False, simple=False, inp=None, regexp=True, log_output=False, path=None,
            force_in_dry_run=False, verbose=True, shell=None, trace=True, stream_output=None, asynchronous=False):
    """
    Run specified command (in a subshell)
    :param cmd: command to run
    :param log_ok: only run output/exit code for failing commands (exit code non-zero)
    :param log_all: always log command output and exit code
    :param simple: if True, just return True/False to indicate success, else return a tuple: (output, exit_code)
    :param inp: the input given to the command via stdin
    :param regexp: regex used to check the output for errors;  if True it will use the default (see parse_log_for_error)
    :param log_output: indicate whether all output of command should be logged to a separate temporary logfile
    :param path: path to execute the command in; current working directory is used if unspecified
    :param force_in_dry_run: force running the command during dry run
    :param verbose: include message on running the command in dry run output
    :param shell: allow commands to not run in a shell (especially useful for cmd lists), defaults to True
    :param trace: print command being executed as part of trace output
    :param stream_output: enable streaming command output to stdout
    :param asynchronous: run command asynchronously (returns subprocess.Popen instance if set to True)
    """
def run_cmd_qa(cmd, qa, no_qa=None, log_ok=True, log_all=False, simple=False, regexp=True, std_qa=None, path=None,
               maxhits=50, trace=True):
    """
    Run specified interactive command (in a subshell)
    :param cmd: command to run
    :param qa: dictionary which maps question to answers
    :param no_qa: list of patters that are not questions
    :param log_ok: only run output/exit code for failing commands (exit code non-zero)
    :param log_all: always log command output and exit code
    :param simple: if True, just return True/False to indicate success, else return a tuple: (output, exit_code)
    :param regexp: regex used to check the output for errors; if True it will use the default (see parse_log_for_error)
    :param std_qa: dictionary which maps question regex patterns to answers
    :param path: path to execute the command is; current working directory is used if unspecified
    :param maxhits: maximum number of cycles (seconds) without being able to find a known question
    :param trace: print command being executed as part of trace output
    """

Design for run function

Default usage:

run(cmd)
  • exits with error on non-zero exit code
  • always returns output (stdout+stderr combined) + exit code
    • always return named tuple as output value (less room for typos, immutable)
      (output="all output", exit_code=0, stderr=None)
  • always log output + exit code (only with log.info)

Example:

res = run(cmd)
res.output
res.exit_code

Options:

  • don't fail on non-zero exit code (fail_on_error=True by default)
  • split stdout/stderr (split_stderr=False by default)
    • return value:
      (output="stdout output", exit_code=0, stderr="errors + warnings")
  • don't show command in terminal (for trace or dry run output) (hidden=False by default)
    • hidden=True is combo of trace=False + verbose=False in run_cmd
  • input to be sent to stdin (stdin=None by default)
    • like inp in run_cmd
  • also run command in dry run mode (in_dry_run=False by default)
    • like force_in_dry_run in run_cmd
  • location to run command in (work_dir=None by default, implies current working directory)
    • like path in run_cmd
  • do not run in shell (shell=True by default)
    • like shell in run_cmd (matches with shell in subprocess.Popen)
  • enable logging of output to temporary file (output_file=False by default)
    • like log_output in run_cmd
    • rename tmp file to .out instead of .log
    • no timestamps, so not really a log
  • enabling streaming of output to stdout (stream_output=False by default)
    • like stream_output in run_cmd
  • run command asynchronously (asynchronous=False by default)
    • like asynchronous in run_cmd
    • (can't use async=False since async is a Python keyword)

For interactive commands:

  • qa_patterns to provide question/answer mapping, questions can be regular expressions (qa=None by default)
    • combo of qa and std_qa in run_cmd_qa
    • qa should be a list of tuples ("question", "answer"), since we want to control order
    • question could be a string value (exact match (end-of-line?), after re.escape) or a compiled regex (regex match)
  • qa_wait_patterns to provide list of non-question patterns and how long to allow them
    • example: [("not a question", 50)]
    • combo of no_qa and maxhits in run_cmd_qa

Not replacing:

  • cryptic log_ok and log_all options
  • simple option to only return True/False
  • regexp to specify custom error patterns (only used for logging currently)

Function signature:

def run(cmd, fail_on_error=True, split_stderr=False, stdin=None,
        hidden=False, in_dry_run=False, work_dir=None, shell=True,
        output_file=False, stream_output=False, asynchronous=False,
        qa_patterns=None, qa_wait_patterns=None)

@Micket
Copy link
Contributor

Micket commented Jul 5, 2023

So, initial condition for this to even be considered is that we are about the abort:

  1. fail_on_error is true
  2. and it did fail

But additionally:

  1. It's actually a tty
  2. The global config option to enter a shell on failure is set shell_on_fail?
  3. The run command hasn't opted out of this explicitly with some argument, perhaps also named shell_on_fail=False
if sys.stdout.isatty() and shell_on_fail and cfg.shell_on_fail:
    print("Dropping into debug shell, exit to return")
    pty.spawn('bash')

edit: maybe the global config option should be the name of the shell used? to allow for bash/sh/zsh or whatever?
If so, perhaps best to not name the boolean argument to the run function and the global config option of what shell, if any, to use.

if sys.stdout.isatty() and !no_shell_on_fail and cfg.shell_on_fail is not None:
    print("Dropping into debug shell, exit to return")
    pty.spawn(cfg.shell_on_fail)

edit: or maybe

if sys.stdout.isatty() and shell_on_fail and cfg.shell_on_fail:
    print("Dropping into debug shell, exit to return")
    pty.spawn(os.environ.get('SHELL'))
)

and maybe we should avoid sourcing the environment for the user. Might be hard to support arbitrary shells though.

if sys.stdout.isatty() and shell_on_fail and cfg.shell_on_fail:
    print("Dropping into debug shell, exit to return")
    os.environ['PS1'] = 'EB debug shell@\h# '
    pty.spawn(['bash', '--noprofile', '--norc'])

@lexming
Copy link
Contributor

lexming commented Jul 5, 2023

Dropping into a totally new shell might not be very useful. We can manually replicate the initial build environment but we will never know what was the environment at the time of the crash. Imagine, for instance, something in the build scripts of the software altering CFLAGS for instance.

We should instead first create a new shell, run the commands in it and if anything fails drop the user into that same shell. Not tested, but something as follows might work:

# create pseudo-terminal
master_fd, slave_fd = pty.openpty()
# run command in it
_log.info(f"Running command '{cmd_msg}' in {work_dir}")
proc = subprocess.run(cmd, stdin=slave_fd, stdout=slave_fd,  stderr=slave_fd)
# drop user into it if fail
if sys.stdout.isatty() and shell_on_fail and cfg.shell_on_fail:
    print("Dropping into debug shell, exit to return")
    pty.spawn(os.environ.get('SHELL'), master_fd, slave_fd)

@Micket
Copy link
Contributor

Micket commented Jul 6, 2023

Dropping into a totally new shell might not be very useful.

is it? It would drop you into the exact same environment from which the command ran, as if you had done all the steps of the easyblock manually so far. All environment variables that EB sets by default + whatever the easyblock has done up til now, and you'd see something like the last command that failed:

export BAR=xxx && export FOO=yyy && ./configure --with-foo=zzz
# or
export BAR=xxx && make -j 10 FOO=123

one could then easily try a few different configure flags, or whatever i might need to find a solution to whatever failed.

Imagine, for instance, something in the build scripts of the software altering CFLAGS for instance.

But running a script, or any program, won't affect the parent shell it was executed from. The only difference i see you could ever make is to include the extra exports (which you would see from the command which just ran) that we often put into the command string.
It does open questions on how we would handle also logging of stdout from the original command here.

I tried to investigate how it behaves with a small script, but pty.spawn doesn't handle actually take those arguments as such so that spawn just breaks the terminal. Couldn't figure out how it would work.

@boegel
Copy link
Member Author

boegel commented Jul 6, 2023

@Micket @lexming This whole discussion belongs in #3950?

@Micket
Copy link
Contributor

Micket commented Jul 6, 2023

Well, part of it was to add some flag for this run interface. I suggest shell_on_fail (default True)

@boegel boegel changed the title implement new run function with cleaner API to replace run_cmd + run_cmd_qa implement new run_shell_cmd function with cleaner API to replace run_cmd + run_cmd_qa Sep 6, 2023
@boegel
Copy link
Member Author

boegel commented Sep 6, 2023

Note: run was renamed to run_shell_cmd in #4335 to avoid confusion

@boegel boegel added the EasyBuild-5.0-blocker Blocker for EasyBuild 5.0 label Jun 5, 2024
@boegel boegel closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants