Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/test-runner: Fix execute() flakiness #142747

Merged
merged 2 commits into from
Nov 2, 2021

Conversation

dasJ
Copy link
Member

@dasJ dasJ commented Oct 24, 2021

Motivation for this change
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests) → switchTest
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • 21.11 Release Notes (or backporting 21.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@dasJ dasJ requested a review from tfc as a code owner October 24, 2021 13:49
@github-actions github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Oct 24, 2021
@dasJ dasJ force-pushed the fix/test-runner-execute branch from 212594b to 0af4309 Compare October 24, 2021 13:51
@dasJ dasJ added the 6.topic: testing Tooling for automated testing of packages and modules label Oct 24, 2021
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Oct 24, 2021
# (for example when 4094 bytes of output were written, half
# of the status code magic would be in one chunk and the other
# half in the next one).
chunk_to_check = prev_chunk + chunk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that chunks are sufficiently big; >= 0.5 * magic

chunk_to_check: it's not really a chunk anymore when they've been combined. Maybe bytes_to_check?

Maybe StreamReader.readuntil could make this code redundant?

Otherwise, see below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried StreamReader but it's async stuff and I'm not going to implement that.

return (status_code, (output + output2).decode())

output += prev_chunk
prev_chunk = chunk
Copy link
Member

@roberth roberth Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could avoid the minimum chunk size assumption by taking something like chunk_to_check[-max_match_size:] instead of chunk. Also, prev_chunk should probably be renamed to something like search_buffer.

@@ -583,23 +583,44 @@ def require_unit_state(self, unit: str, require_state: str = "active") -> None:
)

def execute(self, command: str) -> Tuple[int, str]:
status_code_magic = "|!=EOF"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the trick used in HTTP chunked encoding. If you use a sufficiently long random string, the probability of a false positive is effectively zero.

status_code = int(status_code_b.strip())
return (status_code, (output + output2).decode())

output += prev_chunk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could create a queue of chunks instead, joining them in one go before .decode(). That avoids quadratic complexity.

@dasJ
Copy link
Member Author

dasJ commented Oct 24, 2021

I don't care enough about this problem to learn Python, sorry. Maybe someone else will come up with a better understanding of the language and a better solution (cc @K900).

# exit code it has, so we execute the command, output a special magic
# string afterwards and then the return code.
out_command = (
f"( set -euo pipefail; {command} ); echo '{status_code_magic}' $?\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last time we did this sort of thing we did something like this:

(
  _stdout=$(mktemp)
  _stderr=$(mktemp)
  trap 'rm $_stdout $_stderr' EXIT
  (set -euo pipefail; {command}) >$_stdout 2>$_stderr
  echo RESULT: $? $(stat -c $_stdout $_stderr)
  cat $_stdout $_stderr
)

while that is significant extra wrapping it also obviates the need to check for magic substrings, replies can be handled as having one readable line of header and two known-length data blobs following—any output on stderr immediately signals a failure of the wrapper. though that only works well if commands don't echo gigabytes worth of data on stdout/stderr. :/

@dasJ dasJ force-pushed the fix/test-runner-execute branch from 0af4309 to 71d9e16 Compare October 24, 2021 19:23
@dasJ
Copy link
Member Author

dasJ commented Oct 24, 2021

Looks like I do care… I reimplemented everything to use base64. This way we get around this shifting window stuff by just matching \n.

decoded = chunk.decode()
print(f"c={decoded}")
output_buffer += [decoded]
if decoded[-1] == "\n":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to be the last character? What happens if we receive two lines in one recv call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this breaks. But I don't see how we would receive two lines when we base64-encode the output and tell base64 to not add \n after 76 characters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the exit code is won't be printed before this method returns, so lgtm.

@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Oct 25, 2021
@dasJ
Copy link
Member Author

dasJ commented Oct 25, 2021

@GrahamcOfBorg eval

@cole-h
Copy link
Member

cole-h commented Oct 26, 2021

@ofborg eval

@cole-h cole-h removed the ofborg-internal-error Ofborg encountered an error label Oct 26, 2021
@dasJ dasJ force-pushed the fix/test-runner-execute branch from b55613b to cbde380 Compare October 26, 2021 21:23
@github-actions github-actions bot added the 8.has: documentation This PR adds or changes documentation label Oct 26, 2021
@dasJ
Copy link
Member Author

dasJ commented Oct 26, 2021

I had to add a flag to execute() for the reasons outlined in the documentation

@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Oct 26, 2021
@dasJ dasJ force-pushed the fix/test-runner-execute branch from cbde380 to e1c2abe Compare October 26, 2021 22:31
@cole-h cole-h removed the ofborg-internal-error Ofborg encountered an error label Oct 27, 2021
@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Oct 27, 2021
@dasJ dasJ force-pushed the fix/test-runner-execute branch from e1c2abe to fd8ecd6 Compare October 27, 2021 08:56
@dasJ dasJ removed the ofborg-internal-error Ofborg encountered an error label Oct 27, 2021
@@ -582,24 +582,37 @@ def require_unit_state(self, unit: str, require_state: str = "active") -> None:
+ "'{}' but it is in state ‘{}’".format(require_state, state)
)

def execute(self, command: str) -> Tuple[int, str]:
def _read_line_from_shell(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from following the notifications i see that there are still some changes happening, although i had no time this week to look deeper into all of them.

when you found the good/best working version, i suggest renaming this function. from a function with the current name i would expect that it reads output line by line so i could call it repeatedly (or use it as an iterable generator, which would also be very fancy if it yielded individual lines). but instead, the function blocks until it receives a long string with potentially multiple lines until the last received portion ends with a newline (which might not even always be the case?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already done (at least that what I thought) but then I found it breaks the tests I now fix and then the documentation was wrong and so on… so it should now really be done.

What would you recommend as a name? It's actually correct imo because it reads a single line from the chardev that connects to the backdoor that is called shell but I can see how this generic name is misleading. Maybe something like _read_line_from_shell_pipe or something like this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While re-reading the text, carefully trying to describe the semantics in my own words, in order to use that for a good name, i got another question:

what does not decoded really mean? does this boil down to an empty string? i did not find an explanation in what cases decode() might return a null object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only found one case: this happens which is when the pipe breaks. You then get an empty string (or whatever the type is). The if not decoded is there to prevent infinite loops when that happens. We can probably not rescue from that situation anyway (pipe is alredy broken) and returning with parts of the output is preferrable to hanging.

The case with the broken pipe is where check_return comes into play. When you don't set it to False, execute() tries to write echo ${PIPESTATUS[0]}\n into the shell socket which fails when the pipe is broken, thus making the broken pipe detectable as opposed to just hanging forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipe breaks should already be visible at the recv call, shouldn't they? what do you think about putting the check there and then call the function something along the lines next_newline_closed_block or something, because this is really what we do. (one might still take such unusual [but correct] names as a hint that the overall design of this function has not reached the ideal state yet although it works for our relevant cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tries to write echo ${PIPESTATUS[0]}\n into the shell socket which fails when the pipe is broken

This implies that the test runner (python) broke the pipe, which I can only see happening when the test fails or crashes, so breaking the pipe in this direction doesn't need to be considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it happens. When the hibernate test for example shuts down the VM by means of systemctl hibernate, the pipe breaks because the qemu process dies. That's exactly where I observed the broken pipes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course. These aren't normal processes.

Instead of using the magic string, we now just base64-encode everything
and check for a newline.
@dasJ
Copy link
Member Author

dasJ commented Nov 2, 2021

@tfc any updates to this? I'm starting to feel pressure from the soon to be expected feature freeze

@tfc tfc merged commit 2ba0732 into NixOS:master Nov 2, 2021
@dasJ dasJ deleted the fix/test-runner-execute branch November 2, 2021 21:40
@fgaz
Copy link
Member

fgaz commented Nov 4, 2021

Looks like this PR broke at least one nixos test: #144613

@vcunat
Copy link
Member

vcunat commented Nov 5, 2021

I was bisecting nixosTests.xfce now and ended up here (1640359; retried a few times on the commit and its parent to confirm). This currently blocks the nixos-unstable channel.

@dasJ
Copy link
Member Author

dasJ commented Nov 5, 2021

There's a workaround: #144679

@roberth
Copy link
Member

roberth commented Nov 5, 2021

@vcunat Fix for xfce is in #144795 now

dasJ added a commit to helsinki-systems/nixpkgs that referenced this pull request Nov 6, 2021
dasJ added a commit that referenced this pull request Nov 6, 2021
aszlig added a commit to openlab-aux/vuizvui that referenced this pull request Nov 7, 2021
Recently, the implementation behind Machine.execute() and thus also
Machine.succeed() has been changed[1] to pipe all the command's output
into base64 on the guest machine.

Unfortunately this means that base64 is blocking until stdout is closed,
which in turn means that we now need to make sure that whenever we run a
program in background via "&" we also need to make sure to close stdout.

In the PSI test, we're doing this by simply redirecting the output to
stderr.

[1]: NixOS/nixpkgs#142747

Signed-off-by: aszlig <[email protected]>
aszlig added a commit to aszlig/nixpkgs that referenced this pull request Mar 22, 2022
In NixOS#142747, the implementation
behind Machine.execute() has been changed to pipe all the command's
output into base64 on the guest machine.

Unfortunately this means that base64 is blocking until stdout is closed,
which in turn means that we now need to make sure that whenever we run a
program in background via "&" we also need to make sure to close stdout,
which we do by redirecting stdout to stderr.

Signed-off-by: aszlig <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: testing Tooling for automated testing of packages and modules 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants