Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PersistentSubShell-feature #596

Merged
merged 30 commits into from
Apr 18, 2024
Merged

Add PersistentSubShell-feature #596

merged 30 commits into from
Apr 18, 2024

Conversation

christian-monch
Copy link
Contributor

@christian-monch christian-monch commented Jan 15, 2024

This PR addresses issue #572

It provides a shell_connection-context manager that will start a shell, for example, a local ssh-client, and allows interaction with the shell-process. Interactions are started by sending a command, which will be executed in the shell. The user will then receive the command's stdout. When the command terminates, the user can retrieve the exit code of the command.

Usage example:

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls -l /etc/passwd')
...     print(tuple(r))
...     print(r.returncode)
... 
(b'-rw-r--r-- 1 root root 2773 14. Nov 10:05 /etc/passwd\n',)
0

@christian-monch
Copy link
Contributor Author

christian-monch commented Jan 15, 2024

Multiple commands can be executed in the same shell-context, e.g.

>>> from datalad_next.shell import shell
>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls -l /etc/passwd')
...     print(tuple(r))
...     print(r.returncode)
...     r = ssh(b'ls -l /etc/fstab')
...     print(tuple(r))
...     print(r.returncode)
... 
(b'-rw-r--r-- 1 root root 2773 14. Nov 10:05 /etc/passwd\n',)
0
(b'-rw-r--r-- 1 root root 1534 15. Sep 2022 /etc/fstab')
0

If a command in the shell exits with a non-zero return-code a CommandError is raised (this is done after all output of the command has been read, because the exit code is reported after the command output).

>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'ls no-such-file')
...     print(tuple(r))
... 
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/lib64/python3.10/_collections_abc.py", line 330, in __next__
    return self.send(None)
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/response_generators.py", line 155, in send
    self.check_result()
  File "/home/cristian/Develop/datalad-next/datalad_next/shell/response_generators.py", line 52, in check_result
    raise CommandError(
datalad.runner.exception.CommandError: CommandError: 'ls no-such-file ; x=$?; echo -e -n "----datalad-end-marker-3633463527-rekram-dne-dalatad----\n"; echo $x
' failed with exitcode 2 [err: 'Pseudo-terminal will not be allocated because stdin is not a terminal.
ls: Zugriff auf 'no-such-file' nicht möglich: Datei oder Verzeichnis nicht gefunden']
>>> 

The commands that are executed in the shell can read their stdin from a local iterator:

>>> with shell(['ssh', 'localhost']) as ssh:
...     r = ssh(b'hexdump -n 10 -C', stdin=iter([b'0123456789']))
...     print(tuple(r))
...     r = ssh(b'hexdump -n 5 -C', stdin=iter([b'abcde']))
...     print(tuple(r))
... 
(b'00000000  30 31 32 33 34 35 36 37  38 39                    |0123456789|\n0000000a\n',)
(b'00000000  61 62 63 64 65                                    |abcde|\n00000005\n',)

A shell supports different underlying shell-programs and different higher-level operations, e.g. download. This is implemented via Response Generators. A response generator has two tasks:

  1. Take a command, e.g. b'ls -l /etc/passwd', and augment it in such a way that all necessary information can be retrieved from the output of the augmented command. For example, the response generator might augment the command ls -l /etc/passwd to print an output-end-marker and the return code of the command after the command output. You can see an example of the augmented command in the CommandError-message shown above, i.e. ls no-such-file ; x=$?; echo -e -n "----datalad-end-marker-3633463527-rekram-dne-dalatad----\n"; echo $x.

  2. Read and interpret the output from the shell to yield all output that a command writes to its stdout and to read the return code of the command. The structure that is expected by this step must be created by the augmented command that is create in task 1.

The PR contains a few useful response generator classes. Among them are VariableLengthResponseGenerator-classes, which can be used to handle the output of commands of an unknown length, e.g. ls -R. They are quite versatile and can be used with any command. There are concrete subclasses of VariableLengthResponseGenerator: VariableLengthResponseGeneratorPosix and VariableLengthResponseGeneratorPowerShell. The first is the default-value of the zero_command_rg_class parameter of shell() (the response generator given in this parameter is used internally to skip the login messages of the shell). The provided class is also the default response generator class for all further command invocations in the shell (although that can be overwritten on a call-by-call basis).

VariableLengthResponseGeneratorPosix supports Posix-like shells, e.g. bash, sh, etc. To execute commands in a PowerShell, one can specify VariableLengthResponseGeneratorPowerShell as value of zero_command_rg_class. For example, the following would create a shell-context that executes commands in a Powershell-shell under Windows:

>>> with shell(['powershell', '-Command', '-'], zero_command_rg_class=VariableLengthResponseGeneratorPowerShell) as pwsh:
...     r = pwsh(b'Get-ChildItem -Path \\')
...     print(b''.join(r).decode())
... 


    Directory: C:\


Mode                 LastWriteTime         Length Name                                                                 
----                 -------------         ------ ----                                                                 
d-----        12/15/2023  11:38 AM                DLTMP                                                                
d-r---        12/15/2023  11:09 AM                Program Files                                                        
d-r---         9/21/2023   6:54 AM                Program Files (x86)                                                  
d-----         12/6/2023  11:56 AM                TMP                                                                  
d-r---          9/1/2023  10:29 AM                Users                                                                
da----          1/4/2024  11:04 AM                Windows                                                              
-a----         5/12/2023   4:28 AM         112080 appverifUI.dll                                                       
-a----         5/12/2023   4:29 AM          66160 vfcompat.dll                                                         

Further response generators are: FixedLengthResponseGeneratorPosix and FixedLengthResponseGeneratorPowerShell, which can be used, if the length of the output a command is known a priori, e.g. cat /etc/passwd. The fixed-length response generators deliver better output-reading performance and they do not rely on an end-marker to determine the end of the output of the command.

For POSIX-shells there is a DownloadResponseGeneratorPosix that automatically determines the size of a remote file and uses a fixed-size download.

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

Attention: Patch coverage is 97.27768% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 93.24%. Comparing base (aefe74c) to head (1f34132).
Report is 8 commits behind head on main.

❗ Current head 1f34132 differs from pull request most recent head 9d4ae1a. Consider uploading reports for the commit 9d4ae1a to get more accurate results

Files Patch % Lines
datalad_next/shell/response_generators.py 95.76% 4 Missing and 1 partial ⚠️
datalad_next/shell/operations/common.py 91.66% 2 Missing and 2 partials ⚠️
datalad_next/shell/operations/posix.py 95.12% 1 Missing and 1 partial ⚠️
...talad_next/shell/tests/test_response_generators.py 95.83% 1 Missing and 1 partial ⚠️
...ad_next/iterable_subprocess/iterable_subprocess.py 92.85% 1 Missing ⚠️
datalad_next/shell/tests/test_shell.py 99.51% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #596      +/-   ##
==========================================
+ Coverage   93.05%   93.24%   +0.18%     
==========================================
  Files         171      178       +7     
  Lines       11945    12478     +533     
  Branches     1806     1890      +84     
==========================================
+ Hits        11116    11635     +519     
- Misses        642      651       +9     
- Partials      187      192       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I played with it and it works for me.

Here are some first thoughts.

I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. This suggests that this feature should be in a separate class.

datalad_next/utils/__init__.py Outdated Show resolved Hide resolved
datalad_next/utils/__init__.py Outdated Show resolved Hide resolved
datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved
datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved
datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved
datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved
datalad_next/utils/shell_connection.py Outdated Show resolved Hide resolved
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 19, 2024
This commit addresses the bug described
in:
<datalad#596 (comment)>.

The reason for the `AttributeError` was that
un-caught `StopIteration` exceptions prevented
an initialization of the `returncode`-attribute.
This is fixed by initializing the attribute
in the constructor of
`ShellCommandResponseGenerator`.

The commit adds a regression test that ensures
that the correct error is signaled.
@mih mih added this to the 1.2 milestone Jan 21, 2024
@christian-monch
Copy link
Contributor Author

christian-monch commented Jan 22, 2024

Thanks! I played with it and it works for me.

Here are some first thoughts.

I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

I see it the same way.

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. [...]

I interpreted issue #572 as assuming a POSIX environment on the server.

[...] This suggests that this feature should be in a separate class.

Yes, I agree. Independent of the server environment, it would be better to factor out individual operation implementations and allow for flexible "mix and match". Will do that now. The module shell_operations contains now the operations download, upload, and delete for shells that are executed on POSIX systems and have the commands head, base64, and rm installed.

christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 22, 2024
This commit addresses reviewer comment
<datalad#596 (comment)>.
It exposes `chunk_size` in `shell_connection` and
documents its significance for the size of the
`stderr`-queue that is accessible via the
`stderr_deque`-attribute of a
`ShellCommandResponseGenerator`-object.
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 22, 2024
This commit addresses reviewer comment:
<datalad#596 (comment)>
It renames `ShellCommandExecutor.execute_command` to
`ShellCommandExecutor.__call__`. With that change the
following code becomes possible:

>>> with shell_connection(['bash']) as shell:
...     r = shell(b'uname')
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 22, 2024
This commit addresses reviewer comment:
<datalad#596 (comment)>.

It renames `datalad_next.utils.shell_connection.shell_connection`
to `datalad_next.utils.shell.shell`.

It also renames `datalad_next.utils.tests.test_shell_connection`
to `datalad_next.utils.tests.test_shell`.
@christian-monch
Copy link
Contributor Author

christian-monch commented Jan 23, 2024

[...]
I think we should carefully decide what features go into the base class implementation. From my POV there is nothing inherently "SSH" about this functionality and also nothing inherently "POSIX".

The present up/download/delete implementation assumes a POSIX environment, but this is certainly not without alternative. This suggests that this feature should be in a separate class.

Just to ensure that I interpret this correctly. We would like to support shell-context objects on Posix, OSX, and Windows clients (which is the case because no OS-specific code is used in the implementation)!?

Supported client environments:

  • Posix
  • OSX
  • Windows (if a single command connects to a Posix-like shell, e.g. ssh.exe <some.linux.server>)

Server side

"Server side" refers to the environment in which the connected shell is executed. This environment mainly impacts two things. Firstly, how commands are augmented to allow output and return code detection. Secondly, which commands are sent to perform certain operations. My understanding was that we focus on Posix-server side, i.e. we connect to shells in a Posix environment (in fact the command-augmentation and the initial command that is used to skip login-messages are Posix-specific). So the following is the current state of support:

Supported server environments:

  • Posix
  • OSX
  • Windows

I interpret the comment in such a way that we want to support Windows and OSX server (or at least create the possibility to support then in the future) too. Is that correct? ping @mih

christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 30, 2024
This commit addresses the bug described
in:
<datalad#596 (comment)>.

The reason for the `AttributeError` was that
un-caught `StopIteration` exceptions prevented
an initialization of the `returncode`-attribute.
This is fixed by initializing the attribute
in the constructor of
`ShellCommandResponseGenerator`.

The commit adds a regression test that ensures
that the correct error is signaled.
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 30, 2024
This commit addresses reviewer comment
<datalad#596 (comment)>.
It exposes `chunk_size` in `shell_connection` and
documents its significance for the size of the
`stderr`-queue that is accessible via the
`stderr_deque`-attribute of a
`ShellCommandResponseGenerator`-object.
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 30, 2024
This commit addresses reviewer comment:
<datalad#596 (comment)>
It renames `ShellCommandExecutor.execute_command` to
`ShellCommandExecutor.__call__`. With that change the
following code becomes possible:

>>> with shell_connection(['bash']) as shell:
...     r = shell(b'uname')
christian-monch added a commit to christian-monch/datalad-next that referenced this pull request Jan 30, 2024
This commit addresses reviewer comment:
<datalad#596 (comment)>.

It renames `datalad_next.utils.shell_connection.shell_connection`
to `datalad_next.utils.shell.shell`.

It also renames `datalad_next.utils.tests.test_shell_connection`
to `datalad_next.utils.tests.test_shell`.
@christian-monch christian-monch force-pushed the ssh-shell branch 10 times, most recently from ad0b555 to de9a949 Compare February 1, 2024 14:04
@mih
Copy link
Member

mih commented Apr 13, 2024

Notes on a conversation with @christian-monch in another channel. This PR is considered done. The issue of client code having to take care of the features of the shell that is being connected to is not to be addressed within this PR.

A review needs to pay attention to properly labeling of implementations (e.g., check that the posix operations actually work in any POSIX environment, and not just a subset of shells found in a POSIX environment).

This PR is self-contained, i.e. has the feature implementation and tests for it. However, the feature is added only, and no existing alternative implementations are consolidated on it. A reviewer needs to go through these existing implementations and determine whether they can be replaced with this feature, and if not, why not (candidates to look at: uncurl, (ssh) test helpers (and potentially others), url_operations).

datalad_next/shell/operations/posix.py Outdated Show resolved Hide resolved
Comment on lines +158 to +171
from .operations import posix
from .operations.posix import (
DownloadResponseGenerator,
DownloadResponseGeneratorPosix,
)
from .response_generators import (
FixedLengthResponseGenerator,
FixedLengthResponseGeneratorPosix,
FixedLengthResponseGeneratorPowerShell,
ShellCommandResponseGenerator,
VariableLengthResponseGenerator,
VariableLengthResponseGeneratorPosix,
VariableLengthResponseGeneratorPowerShell,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a common naming/selection strategy?

Client code must choose the current implementation for there use case, but it seems a simple label for a shell-setup is not sufficient. We have a posix operations module, but no one for response generators.

Why not have one posix module and it has all pieces needed?

Along the same lines, I think we should also avoid that client code has to switch class names per use case. Now thye need to select a module to import from and what to import also. This means that client code also needs to maintain a mapping of the API. If we would provide modules with the exact same API, client code could be simplified to just get the label of the "platform/shell-type" module to work with.

I also want to point out that the promise is that everything imported from datalad_next.shell is stable (or needs a minor/major release to change). We import the entire posix module, which not only has the pieces that aim to be a "public" API. I think this calls for a dedicated module that exposes the API.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into replacing the implementation of assert_ssh_access with this functionality. Here is some code that I came up with, based on the existing documentation

    ssh_bash_call = [
        ssh_bin,
        '-i', seckey,
        '-p', port,
        f'{login}@{host}',
        'bash',
    ]
    from more_itertools import consume
    with shell(ssh_bash_call) as ssh:
        consume(ssh(f'mkdir -p {path}'))
        consume(ssh(f'touch {path}/datalad-tests-probe'))

Two main issues confused me. The top-level documentation makes the impression that all commands need to be byte-strings (so no f-string formatting is possible). But fortunately this is not true.

It took me a moment to understand the implications of each line of the doc example shell().

print(b''.join(result))

Is not just a demo. But the command associated with result does not actually run without it.

I think that ShellCommandExecutor.__call__ should not have this behavior. myshell(something) appears to execute a command, but it really gives a handler that needs more boilerplate code to actually do things. At minimum something like

consume(myshell(something))

I believe __call__() should better do that. Maybe have a check= like subprocess.run().

Strangely, when I run a command that fails, I get a CommandError immediately. The docs suggest that there should be no exception, but I have to inspect a returncode manually.

In some sense I am glad that I do not have to implement a custom error handler for each execution, but I am also not sure if I am doing things correctly.

Overall, I feel we need more examples in the top-level docs. Use cases are similar to the call_git helpers:

  • run a command that is expected to work, raise exception if not
  • check if a command fails or not, return bool
  • run a command that is expected to fail, act on returncode
  • run a command that produces output, variants
    • expect and produce one line, fail if more
    • expect and produce any number of lines

@christian-monch
Copy link
Contributor Author

I looked into replacing the implementation of assert_ssh_access with this functionality. Here is some code that I came up with, based on the existing documentation

    ssh_bash_call = [
        ssh_bin,
        '-i', seckey,
        '-p', port,
        f'{login}@{host}',
        'bash',
    ]
    from more_itertools import consume
    with shell(ssh_bash_call) as ssh:
        consume(ssh(f'mkdir -p {path}'))
        consume(ssh(f'touch {path}/datalad-tests-probe'))

Two main issues confused me. The top-level documentation makes the impression that all commands need to be byte-strings (so no f-string formatting is possible). But fortunately this is not true.

A thanks for catching that, I modified the code according to a reviewer-comment and forgot to change the docs.

It took me a moment to understand the implications of each line of the doc example shell().

print(b''.join(result))

Is not just a demo. But the command associated with result does not actually run without it.

I think that ShellCommandExecutor.__call__ should not have this behavior. myshell(something) appears to execute a command, but it really gives a handler that needs more boilerplate code to actually do things. At minimum something like

consume(myshell(something))

I believe __call__() should better do that. Maybe have a check= like subprocess.run().

We can do that, it would mean all output is assembled before it is returned to the user. I assume that we don't expect long slow outputs, e.g. find /, so that would be OK. What do you think about collecting all output in __call__() and adding a second method that would return a generator?

Strangely, when I run a command that fails, I get a CommandError immediately. The docs suggest that there should be no exception, but I have to inspect a returncode manually.

Thanks, will fix the docs (also an oversight after addressing a reviewer comment).

In some sense I am glad that I do not have to implement a custom error handler for each execution, but I am also not sure if I am doing things correctly.

Overall, I feel we need more examples in the top-level docs. Use cases are similar to the call_git helpers:

  • run a command that is expected to work, raise exception if not

  • check if a command fails or not, return bool

  • run a command that is expected to fail, act on returncode

  • run a command that produces output, variants

    • expect and produce one line, fail if more
    • expect and produce any number of lines

These are very useful suggestions, I suggest doing the following:

  1. Fix the documentation regarding input types and exceptions
  2. modify __call__ to:
    2.1. accept a check-argument (default: False) that determines whether an exception is raised when the remote command fails.
    2.2. collect all output and return a result dict or dataclass that contains stdout and the return code (unless check is True and the remote command exits with non-zero.
  3. add another execution call that returns a generator (useful for timely output processing of long running remote commands)
  4. add more examples to top level documentation according to the list above.

mih added a commit to mih/datalad-next that referenced this pull request Apr 17, 2024
See datalad#654

Like require employing datalad#596
to solve properly.
TODO: update docstrings

1. make shell.__call__ consume everything
2. add shell.start
3. don't raise CommandError by default
4. add check to enable CommandError-raising
5. small improvements
6. extend tests
christian-monch and others added 8 commits April 17, 2024 14:58
Previously, this has been a series of SSH calls, each running in its own
shell. Now a persistent shell is used to interleave remote and local
checks.
This commit adapts the changes made by
mih in his forked branch to the changes
that were made in this branch in the
meantime.
@mih
Copy link
Member

mih commented Apr 18, 2024

I will merge this now. There are some unresolved things and tests should still run. But there are too many things pending and we will work on them in parallel for better throughput.

@mih mih merged commit 5a12b73 into datalad:main Apr 18, 2024
3 of 6 checks passed
This was referenced Apr 23, 2024
@christian-monch christian-monch deleted the ssh-shell branch July 16, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants