Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use iter_subproc instead of ThreadedRunner in SshUrlOperations #566

Conversation

christian-monch
Copy link
Contributor

@christian-monch christian-monch commented Dec 11, 2023

This PR uses iter_subproc instead of ThreadedRunner from datalad-core to implement SshUrlOperations. It also introduces the align_pattern iterator, which is used by SshUrlOperations to reliably detect marker in the output stream of a ssh-shell.

christian-monch and others added 18 commits December 6, 2023 14:21
This commit adds the `align_pattern`-iterator
including tests.

The `align_pattern`-iterator is used in
the `iter_subproc`-based implementation
of `SshUrlOperations`.
This commit replaces all uses of
`datalad.runner.nonasyncrunner` in
`datalad_next.url_operations.ssh`
with `datalad_next.runners.iter_subproc`

The commit also adapts the tests.
Make key exception type available from the same location.
Unclear why useful
Rather than feeding everything through the pattern aligner.
There seems to be no gain, and it can be easily done, it appears.
It is unclear to me why it was removed.
The code that closed `dst_fp` was removed by
mistake. This commit brings it back.

The commit addresses reviewer comment:
datalad#546 (comment)
This commit moves the calls to
`SshUrlOperations._progress_report_stop` into
the `after`-function of `side_effect`.

This moves the progress logic into a "single"
place, which makes it easier to factor a
progress reporting iterator out in the next
commit.
This commit factors out the use of `side_effect`
in `SshUrlOperation` to implement progress
reporting.

It creates the helper-function
`UrlOperations._reporting` the takes an
`Iterable` and yields the items in the
iterable while calling progress report
methods for each item. It also starts
and ends the progress reporting.
This commit updates the docstring of `align_pattern`
to document that `align_pattern` can be "removed"
from a stream after it yields an item.

This is used in `SshUrlOperations._get_props`
to return a "clean" stream-iterator, i.e. a stream
iterator that does not still execute the
`align_pattern`-code.

(The code was actually already doing the right
thing, but comments were still suggesting, that
`align_pattern` stays in the loop.)
This commit adds a simple fix for the test
`test_exception_from_return_code` to allow
it to pass if the command output is localized
for `de_DE`.
This commit catches `StopIteration`-excptions
that are raised in an `iter_subproc`-context
to ensure that `IterableSubprocessError`
generation is performed (if the subprocess
returned non-zero).

Without catching `StopIteration`-exceptions
within the subproc-context, a `next()` on an
exhausted iterator would trigger a code-path
that does not check the
return code of the subprocess. This path is
actually triggered by almost all exceptions,
but we generate `StopIteration` as part of
our error-free program flow, so we have to
deal with it at some point.

There are a few possible solutions here:

1. Do what this commit does. Don't let
   expected exceptions leave the
   subproc-context.

2. Modify the code in such a way, that
   it does not use `next` on iterators.

3. Ensure that `iterable_subprocess`
   executes the check for
   `IterableSubprocessError` in the
   presence of an exception from the
   context. This could easily be done
   by putting the check into a
   `finally`-branch. But that would
   "hide" a code exception, if the
   subprocess does exit with non-zero.
   (And would not pass a lot of tests)
This commit finalizes the changes in the previous
commit, which renamed the usage of
`UrlOperations._reporting` to `UrlOperations._with_progress`.
This commit performs the missing implementation renaming.
This commit converts the generator function
`output_from` into an instance of a the
generator class `OutputFrom`.

The rationale for this conversion is that
the generator class can store the
return code of the subprocess as attribute.
Thus the returncode is available, even
if an exception is raised within the
`iterable_subprocess`-context.
This commit adds some basic tests to verify
that the `returncode` attribute of the
`iterable_subprocess` generator is properly
set.
This commit modifies SshUrlOperations to make use
of the `returncode`-attribute of the generator that
is returned by `with iterable_subprocess`.

This allows to fetch the return code of the subprocess
even if no `IterableSubprocessError` was raised.
This situation can occur, if an exception is raised
in an `iterable_subprocess`-context.
@christian-monch christian-monch force-pushed the demo-ssh-url-with-iter-proc-returncode branch from cae9210 to e2937d8 Compare December 11, 2023 08:47
Copy link

codecov bot commented Dec 11, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (8a40cd5) 92.68% compared to head (ecf1f23) 92.57%.
Report is 1 commits behind head on main.

Files Patch % Lines
datalad_next/url_operations/ssh.py 84.84% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
- Coverage   92.68%   92.57%   -0.11%     
==========================================
  Files         142      144       +2     
  Lines       10465    10483      +18     
  Branches     1556     1560       +4     
==========================================
+ Hits         9699     9705       +6     
- Misses        597      606       +9     
- Partials      169      172       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@christian-monch christian-monch changed the title Use returncode attribute of iter_subproc result in SshUrlOperations Use iter_subproc instead of ThreadedRunner in SshUrlOperations Dec 14, 2023
@christian-monch
Copy link
Contributor Author

Merged main again to fetch code from PR #570

In some cases ssh-commands return a zero
exit value, but we cannot read all expected
data, i.e. we get a `StopIteration` exception
when calling `next(stream)`. This was not
expected and lead to an access to a not
existing variable.

This commit fixes this by assuming that
the combination of a `0` return code and
a short read indicates that the file in
question does not exist anymore.
In some cases `SshUrlOperations._get_props()`
raises a `StopIteration`-exceptions. This
usually indicates a non-existing resource.
The return code of the `ssh`-subprocess might
be 255, because the process got terminated
when leaving the `iter_subproc` context (due
to the exception) before it exited with 244,
which would usually indicate a non-existing
resource.

We therefore handle `StopIteration` differntly,
i.e. we do not check the return code if a
`StopIteration`-exception occurs. Instead
we always assume a non-existing resource.
This commit escapes backslashes properly.
The old code would send byte values
0x01, 0x02, 0x03 into commands on the
ssh-server side.
@mih
Copy link
Member

mih commented Dec 16, 2023

Merged as #573

@mih mih closed this Dec 16, 2023
@christian-monch christian-monch deleted the demo-ssh-url-with-iter-proc-returncode branch July 16, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants