Add `PersistentSubShell`-feature #596

christian-monch · 2024-01-15T14:19:05Z

This PR addresses issue #572

It provides a shell_connection-context manager that will start a shell, for example, a local ssh-client, and allows interaction with the shell-process. Interactions are started by sending a command, which will be executed in the shell. The user will then receive the command's stdout. When the command terminates, the user can retrieve the exit code of the command.

Usage example:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls -l /etc/passwd')
...     print(tuple(r))
...     print(r.returncode)
... 
(b'-rw-r--r-- 1 root root 2773 14. Nov 10:05 /etc/passwd\n',)
0

christian-monch · 2024-01-15T14:49:09Z

Multiple commands can be executed in the same shell-context, e.g.

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls -l /etc/passwd')
...     print(tuple(r))
...     print(r.returncode)
...     r = ssh(b'ls -l /etc/fstab')
...     print(tuple(r))
...     print(r.returncode)
... 
(b'-rw-r--r-- 1 root root 2773 14. Nov 10:05 /etc/passwd\n',)
0
(b'-rw-r--r-- 1 root root 1534 15. Sep 2022 /etc/fstab')
0

If a command in the shell exits with a non-zero return-code a CommandError is raised (this is done after all output of the command has been read, because the exit code is reported after the command output).

>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls no-such-file')
...     print(tuple(r))
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/lib64/python3.10/_collections_abc.py", line 330, in __next__
    return self.send(None)
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/response_generators.py", line 155, in send
    self.check_result()
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/response_generators.py", line 52, in check_result
    raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'ls no-such-file ; x=$?; echo -e -n "----datalad-end-marker-3633463527-rekram-dne-dalatad----\n"; echo $x
' failed with exitcode 2 [err: 'Pseudo-terminal will not be allocated because stdin is not a terminal.
ls: Zugriff auf 'no-such-file' nicht möglich: Datei oder Verzeichnis nicht gefunden']
>>>

The commands that are executed in the shell can read their stdin from a local iterator:

>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'hexdump -n 10 -C', stdin=iter([b'0123456789']))
...     print(tuple(r))
...     r = ssh(b'hexdump -n 5 -C', stdin=iter([b'abcde']))
...     print(tuple(r))
... 
(b'00000000  30 31 32 33 34 35 36 37  38 39                    |0123456789|\n0000000a\n',)
(b'00000000  61 62 63 64 65                                    |abcde|\n00000005\n',)

A shell supports different underlying shell-programs and different higher-level operations, e.g. download. This is implemented via Response Generators. A response generator has two tasks:

Take a command, e.g. b'ls -l /etc/passwd', and augment it in such a way that all necessary information can be retrieved from the output of the augmented command. For example, the response generator might augment the command ls -l /etc/passwd to print an output-end-marker and the return code of the command after the command output. You can see an example of the augmented command in the CommandError-message shown above, i.e. ls no-such-file ; x=$?; echo -e -n "----datalad-end-marker-3633463527-rekram-dne-dalatad----\n"; echo $x.
Read and interpret the output from the shell to yield all output that a command writes to its stdout and to read the return code of the command. The structure that is expected by this step must be created by the augmented command that is create in task 1.

The PR contains a few useful response generator classes. Among them are VariableLengthResponseGenerator-classes, which can be used to handle the output of commands of an unknown length, e.g. ls -R. They are quite versatile and can be used with any command. There are concrete subclasses of VariableLengthResponseGenerator: VariableLengthResponseGeneratorPosix and VariableLengthResponseGeneratorPowerShell. The first is the default-value of the zero_command_rg_class parameter of shell() (the response generator given in this parameter is used internally to skip the login messages of the shell). The provided class is also the default response generator class for all further command invocations in the shell (although that can be overwritten on a call-by-call basis).

VariableLengthResponseGeneratorPosix supports Posix-like shells, e.g. bash, sh, etc. To execute commands in a PowerShell, one can specify VariableLengthResponseGeneratorPowerShell as value of zero_command_rg_class. For example, the following would create a shell-context that executes commands in a Powershell-shell under Windows:

>>> with shell(['powershell', '-Command', '-'], zero_command_rg_class=VariableLengthResponseGeneratorPowerShell) as pwsh:
...     r = pwsh(b'Get-ChildItem -Path \\')
...     print(b''.join(r).decode())
... 


    Directory: C:\


Mode                 LastWriteTime         Length Name                                                                 
----                 -------------         ------ ----                                                                 
d-----        12/15/2023  11:38 AM                DLTMP                                                                
d-r---        12/15/2023  11:09 AM                Program Files                                                        
d-r---         9/21/2023   6:54 AM                Program Files (x86)                                                  
d-----         12/6/2023  11:56 AM                TMP                                                                  
d-r---          9/1/2023  10:29 AM                Users                                                                
da----          1/4/2024  11:04 AM                Windows                                                              
-a----         5/12/2023   4:28 AM         112080 appverifUI.dll                                                       
-a----         5/12/2023   4:29 AM          66160 vfcompat.dll

Further response generators are: FixedLengthResponseGeneratorPosix and FixedLengthResponseGeneratorPowerShell, which can be used, if the length of the output a command is known a priori, e.g. cat /etc/passwd. The fixed-length response generators deliver better output-reading performance and they do not rely on an end-marker to determine the end of the output of the command.

For POSIX-shells there is a DownloadResponseGeneratorPosix that automatically determines the size of a remote file and uses a fixed-size download.

codecov · 2024-01-15T17:14:02Z

Codecov Report

Attention: Patch coverage is 97.27768% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 93.24%. Comparing base (aefe74c) to head (1f34132).
Report is 8 commits behind head on main.

❗ Current head 1f34132 differs from pull request most recent head 9d4ae1a. Consider uploading reports for the commit 9d4ae1a to get more accurate results

Files	Patch %	Lines
datalad_next/shell/response_generators.py	95.76%	4 Missing and 1 partial ⚠️
datalad_next/shell/operations/common.py	91.66%	2 Missing and 2 partials ⚠️
datalad_next/shell/operations/posix.py	95.12%	1 Missing and 1 partial ⚠️
...talad_next/shell/tests/test_response_generators.py	95.83%	1 Missing and 1 partial ⚠️
...ad_next/iterable_subprocess/iterable_subprocess.py	92.85%	1 Missing ⚠️
datalad_next/shell/tests/test_shell.py	99.51%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #596      +/-   ##
==========================================
+ Coverage   93.05%   93.24%   +0.18%     
==========================================
  Files         171      178       +7     
  Lines       11945    12478     +533     
  Branches     1806     1890      +84     
==========================================
+ Hits        11116    11635     +519     
- Misses        642      651       +9     
- Partials      187      192       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

datalad_next/utils/shell_connection.py

mih

Thanks! I played with it and it works for me.

Here are some first thoughts.

I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. This suggests that this feature should be in a separate class.

datalad_next/utils/__init__.py

datalad_next/utils/shell_connection.py

This commit addresses the bug described in: <datalad#596 (comment)>. The reason for the `AttributeError` was that un-caught `StopIteration` exceptions prevented an initialization of the `returncode`-attribute. This is fixed by initializing the attribute in the constructor of `ShellCommandResponseGenerator`. The commit adds a regression test that ensures that the correct error is signaled.

christian-monch · 2024-01-22T07:59:42Z

Thanks! I played with it and it works for me.

Here are some first thoughts.

I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

I see it the same way.

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. [...]

I interpreted issue #572 as assuming a POSIX environment on the server.

[...] This suggests that this feature should be in a separate class.

Yes, I agree. Independent of the server environment, it would be better to factor out individual operation implementations and allow for flexible "mix and match". Will do that now. The module shell_operations contains now the operations download, upload, and delete for shells that are executed on POSIX systems and have the commands head, base64, and rm installed.

This commit addresses reviewer comment <datalad#596 (comment)>. It exposes `chunk_size` in `shell_connection` and documents its significance for the size of the `stderr`-queue that is accessible via the `stderr_deque`-attribute of a `ShellCommandResponseGenerator`-object.

This commit addresses reviewer comment: <datalad#596 (comment)> It renames `ShellCommandExecutor.execute_command` to `ShellCommandExecutor.__call__`. With that change the following code becomes possible: >>> with shell_connection(['bash']) as shell: ... r = shell(b'uname')

This commit addresses reviewer comment: <datalad#596 (comment)>. It renames `datalad_next.utils.shell_connection.shell_connection` to `datalad_next.utils.shell.shell`. It also renames `datalad_next.utils.tests.test_shell_connection` to `datalad_next.utils.tests.test_shell`.

christian-monch · 2024-01-23T08:56:23Z

[...]
I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. This suggests that this feature should be in a separate class.

Just to ensure that I interpret this correctly. We would like to support shell-context objects on Posix, OSX, and Windows clients (which is the case because no OS-specific code is used in the implementation)!?

Supported client environments:

Posix
OSX
Windows (if a single command connects to a Posix-like shell, e.g. ssh.exe <some.linux.server>)

Server side

"Server side" refers to the environment in which the connected shell is executed. This environment mainly impacts two things. Firstly, how commands are augmented to allow output and return code detection. Secondly, which commands are sent to perform certain operations. My understanding was that we focus on Posix-server side, i.e. we connect to shells in a Posix environment (in fact the command-augmentation and the initial command that is used to skip login-messages are Posix-specific). So the following is the current state of support:

Supported server environments:

Posix
OSX
Windows

I interpret the comment in such a way that we want to support Windows and OSX server (or at least create the possibility to support then in the future) too. Is that correct? ping @mih

This commit addresses the bug described in: <datalad#596 (comment)>. The reason for the `AttributeError` was that un-caught `StopIteration` exceptions prevented an initialization of the `returncode`-attribute. This is fixed by initializing the attribute in the constructor of `ShellCommandResponseGenerator`. The commit adds a regression test that ensures that the correct error is signaled.

This commit addresses reviewer comment <datalad#596 (comment)>. It exposes `chunk_size` in `shell_connection` and documents its significance for the size of the `stderr`-queue that is accessible via the `stderr_deque`-attribute of a `ShellCommandResponseGenerator`-object.

This commit addresses reviewer comment: <datalad#596 (comment)> It renames `ShellCommandExecutor.execute_command` to `ShellCommandExecutor.__call__`. With that change the following code becomes possible: >>> with shell_connection(['bash']) as shell: ... r = shell(b'uname')

This commit addresses reviewer comment: <datalad#596 (comment)>. It renames `datalad_next.utils.shell_connection.shell_connection` to `datalad_next.utils.shell.shell`. It also renames `datalad_next.utils.tests.test_shell_connection` to `datalad_next.utils.tests.test_shell`.

mih · 2024-04-13T10:45:28Z

Notes on a conversation with @christian-monch in another channel. This PR is considered done. The issue of client code having to take care of the features of the shell that is being connected to is not to be addressed within this PR.

A review needs to pay attention to properly labeling of implementations (e.g., check that the posix operations actually work in any POSIX environment, and not just a subset of shells found in a POSIX environment).

This PR is self-contained, i.e. has the feature implementation and tests for it. However, the feature is added only, and no existing alternative implementations are consolidated on it. A reviewer needs to go through these existing implementations and determine whether they can be replaced with this feature, and if not, why not (candidates to look at: uncurl, (ssh) test helpers (and potentially others), url_operations).

datalad_next/shell/operations/posix.py

mih · 2024-04-13T12:33:32Z

datalad_next/shell/__init__.py

+from .operations import posix
+from .operations.posix import (
+    DownloadResponseGenerator,
+    DownloadResponseGeneratorPosix,
+)
+from .response_generators import (
+    FixedLengthResponseGenerator,
+    FixedLengthResponseGeneratorPosix,
+    FixedLengthResponseGeneratorPowerShell,
+    ShellCommandResponseGenerator,
+    VariableLengthResponseGenerator,
+    VariableLengthResponseGeneratorPosix,
+    VariableLengthResponseGeneratorPowerShell,
+)


Should there be a common naming/selection strategy?

Client code must choose the current implementation for there use case, but it seems a simple label for a shell-setup is not sufficient. We have a posix operations module, but no one for response generators.

Why not have one posix module and it has all pieces needed?

Along the same lines, I think we should also avoid that client code has to switch class names per use case. Now thye need to select a module to import from and what to import also. This means that client code also needs to maintain a mapping of the API. If we would provide modules with the exact same API, client code could be simplified to just get the label of the "platform/shell-type" module to work with.

I also want to point out that the promise is that everything imported from datalad_next.shell is stable (or needs a minor/major release to change). We import the entire posix module, which not only has the pieces that aim to be a "public" API. I think this calls for a dedicated module that exposes the API.

mih

I looked into replacing the implementation of assert_ssh_access with this functionality. Here is some code that I came up with, based on the existing documentation

    ssh_bash_call = [
        ssh_bin,
        '-i', seckey,
        '-p', port,
        f'{login}@{host}',
        'bash',
    ]
    from more_itertools import consume
    with shell(ssh_bash_call) as ssh:
        consume(ssh(f'mkdir -p {path}'))
        consume(ssh(f'touch {path}/datalad-tests-probe'))

Two main issues confused me. The top-level documentation makes the impression that all commands need to be byte-strings (so no f-string formatting is possible). But fortunately this is not true.

It took me a moment to understand the implications of each line of the doc example shell().

print(b''.join(result))

Is not just a demo. But the command associated with result does not actually run without it.

I think that ShellCommandExecutor.__call__ should not have this behavior. myshell(something) appears to execute a command, but it really gives a handler that needs more boilerplate code to actually do things. At minimum something like

consume(myshell(something))

I believe __call__() should better do that. Maybe have a check= like subprocess.run().

Strangely, when I run a command that fails, I get a CommandError immediately. The docs suggest that there should be no exception, but I have to inspect a returncode manually.

In some sense I am glad that I do not have to implement a custom error handler for each execution, but I am also not sure if I am doing things correctly.

Overall, I feel we need more examples in the top-level docs. Use cases are similar to the call_git helpers:

run a command that is expected to work, raise exception if not
check if a command fails or not, return bool
run a command that is expected to fail, act on returncode
run a command that produces output, variants
- expect and produce one line, fail if more
- expect and produce any number of lines

christian-monch · 2024-04-15T20:01:38Z

I looked into replacing the implementation of assert_ssh_access with this functionality. Here is some code that I came up with, based on the existing documentation
    ssh_bash_call = [
        ssh_bin,
        '-i', seckey,
        '-p', port,
        f'{login}@{host}',
        'bash',
    ]
    from more_itertools import consume
    with shell(ssh_bash_call) as ssh:
        consume(ssh(f'mkdir -p {path}'))
        consume(ssh(f'touch {path}/datalad-tests-probe'))
Two main issues confused me. The top-level documentation makes the impression that all commands need to be byte-strings (so no f-string formatting is possible). But fortunately this is not true.

A thanks for catching that, I modified the code according to a reviewer-comment and forgot to change the docs.

It took me a moment to understand the implications of each line of the doc example shell().

print(b''.join(result))

Is not just a demo. But the command associated with result does not actually run without it.

I think that ShellCommandExecutor.__call__ should not have this behavior. myshell(something) appears to execute a command, but it really gives a handler that needs more boilerplate code to actually do things. At minimum something like

consume(myshell(something))

I believe __call__() should better do that. Maybe have a check= like subprocess.run().

We can do that, it would mean all output is assembled before it is returned to the user. I assume that we don't expect long slow outputs, e.g. find /, so that would be OK. What do you think about collecting all output in __call__() and adding a second method that would return a generator?

Strangely, when I run a command that fails, I get a CommandError immediately. The docs suggest that there should be no exception, but I have to inspect a returncode manually.

Thanks, will fix the docs (also an oversight after addressing a reviewer comment).

In some sense I am glad that I do not have to implement a custom error handler for each execution, but I am also not sure if I am doing things correctly.

Overall, I feel we need more examples in the top-level docs. Use cases are similar to the call_git helpers:

run a command that is expected to work, raise exception if not

check if a command fails or not, return bool

run a command that is expected to fail, act on returncode

run a command that produces output, variants

expect and produce one line, fail if more

expect and produce any number of lines

These are very useful suggestions, I suggest doing the following:

Fix the documentation regarding input types and exceptions
modify __call__ to:
2.1. accept a check-argument (default: False) that determines whether an exception is raised when the remote command fails.
2.2. collect all output and return a result dict or dataclass that contains stdout and the return code (unless check is True and the remote command exits with non-zero.
add another execution call that returns a generator (useful for timely output processing of long running remote commands)
add more examples to top level documentation according to the list above.

See datalad#654 Like require employing datalad#596 to solve properly.

TODO: update docstrings 1. make shell.__call__ consume everything 2. add shell.start 3. don't raise CommandError by default 4. add check to enable CommandError-raising 5. small improvements 6. extend tests

Previously, this has been a series of SSH calls, each running in its own shell. Now a persistent shell is used to interleave remote and local checks.

This commit adapts the changes made by mih in his forked branch to the changes that were made in this branch in the meantime.

Co-authored-by: Michael Hanke <[email protected]>

mih · 2024-04-18T06:47:10Z

I will merge this now. There are some unresolved things and tests should still run. But there are too many things pending and we will work on them in parallel for better throughput.

christian-monch force-pushed the ssh-shell branch from 55c682a to 90b5547 Compare January 15, 2024 16:39

christian-monch force-pushed the ssh-shell branch from 90b5547 to f19e680 Compare January 15, 2024 17:22

adswa reviewed Jan 17, 2024

View reviewed changes

datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved

mih reviewed Jan 17, 2024

View reviewed changes

christian-monch force-pushed the ssh-shell branch from f19e680 to c08cfff Compare January 19, 2024 10:38

mih added this to the 1.2 milestone Jan 21, 2024

This was referenced Jan 26, 2024

Implement DELETE for SSHUrlOperations #533

Closed

Implement "force" for UrlOperations.delete|upload() #454

Open

christian-monch force-pushed the ssh-shell branch 10 times, most recently from ad0b555 to de9a949 Compare February 1, 2024 14:04

christian-monch mentioned this pull request Mar 25, 2024

[Experimental] multiflavor ORA-remote datalad/datalad-ria#106

Draft

christian-monch added 3 commits March 26, 2024 14:56

fix a comment

620bbe1

add a -currently unused- credential argument

028e752

fix information in comments

9d4ae1a

mih reviewed Apr 13, 2024

View reviewed changes

This was referenced Apr 17, 2024

Fix RIA store operations from Mac clients #653

Merged

ORA remote initialization stalls on windows #654

Closed

mih added a commit to mih/datalad-next that referenced this pull request Apr 17, 2024

Do not run the RIA test on windows (yet)

dc779fa

See datalad#654 Like require employing datalad#596 to solve properly.

address reviewer comments

d006050

TODO: update docstrings 1. make shell.__call__ consume everything 2. add shell.start 3. don't raise CommandError by default 4. add check to enable CommandError-raising 5. small improvements 6. extend tests

mih mentioned this pull request Apr 17, 2024

Provide replacement for SSHRemoteIO based on shell #655

Merged

6 tasks

update examples in docstrings

1a99586

christian-monch force-pushed the ssh-shell branch from 96310c9 to 1a99586 Compare April 17, 2024 12:29

christian-monch and others added 8 commits April 17, 2024 14:58

update documentation

ba8985e

improve doc-strings, remove unused code

e68aa0d

Employ shell feature in SSH test fixture

7b5635e

Previously, this has been a series of SSH calls, each running in its own shell. Now a persistent shell is used to interleave remote and local checks.

adapt mih's changes to changed code

c1a2dcf

This commit adapts the changes made by mih in his forked branch to the changes that were made in this branch in the meantime.

fix a typo

2117d65

fix escape in docstrings

04b4140

fix type issues and a comment

bcb39b8

fix windows tests

79436f8

mih mentioned this pull request Apr 18, 2024

Add examples to shell documentation #656

Closed

Update datalad_next/shell/operations/posix.py

ce445a8

Co-authored-by: Michael Hanke <[email protected]>

mih merged commit 5a12b73 into datalad:main Apr 18, 2024
3 of 6 checks passed

This was referenced Apr 23, 2024

PersistentSubShell feature #572

Closed

Failing powershell tests #662

Closed

christian-monch deleted the ssh-shell branch July 16, 2024 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `PersistentSubShell`-feature #596

Add `PersistentSubShell`-feature #596

christian-monch commented Jan 15, 2024 •

edited

Loading

christian-monch commented Jan 15, 2024 •

edited

Loading

codecov bot commented Jan 15, 2024 •

edited

Loading

mih left a comment

christian-monch commented Jan 22, 2024 •

edited

Loading

christian-monch commented Jan 23, 2024 •

edited

Loading

mih commented Apr 13, 2024

mih Apr 13, 2024

mih left a comment •

edited

Loading

christian-monch commented Apr 15, 2024

mih commented Apr 18, 2024

Add PersistentSubShell-feature #596

Add PersistentSubShell-feature #596

Conversation

christian-monch commented Jan 15, 2024 • edited Loading

christian-monch commented Jan 15, 2024 • edited Loading

codecov bot commented Jan 15, 2024 • edited Loading

Codecov Report

mih left a comment

Choose a reason for hiding this comment

christian-monch commented Jan 22, 2024 • edited Loading

christian-monch commented Jan 23, 2024 • edited Loading

Supported client environments:

Server side

Supported server environments:

mih commented Apr 13, 2024

mih Apr 13, 2024

Choose a reason for hiding this comment

mih left a comment • edited Loading

Choose a reason for hiding this comment

christian-monch commented Apr 15, 2024

mih commented Apr 18, 2024

Add `PersistentSubShell`-feature #596

Add `PersistentSubShell`-feature #596

christian-monch commented Jan 15, 2024 •

edited

Loading

christian-monch commented Jan 15, 2024 •

edited

Loading

codecov bot commented Jan 15, 2024 •

edited

Loading

christian-monch commented Jan 22, 2024 •

edited

Loading

christian-monch commented Jan 23, 2024 •

edited

Loading

mih left a comment •

edited

Loading