Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/test-driver: Fix thread cleanup and execute #142560

Closed
wants to merge 1 commit into from

Conversation

dasJ
Copy link
Member

@dasJ dasJ commented Oct 22, 2021

Motivation for this change

The unstable channel is blocked by the test runner hanging in cleanup.
This is not my change, it was written by @K900 and I don't fully understand why it fixes things but it seems to make the test runner exit which it is currently not doing.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • 21.11 Release Notes (or backporting 21.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@dasJ dasJ requested a review from tfc as a code owner October 22, 2021 11:05
@github-actions github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Oct 22, 2021
@dasJ dasJ mentioned this pull request Oct 22, 2021
12 tasks
@K900
Copy link
Contributor

K900 commented Oct 22, 2021

This seems to resolve a deadlock when shutting down a test while the QEMU process is hanging (which it seems to be doing consistently on Hydra, and not anywhere else I've tested). Also a few assorted cleanups to ensure we can't get deadlocked receiving outputs from a command.

@dasJ dasJ added 6.topic: testing Tooling for automated testing of packages and modules 1.severity: channel blocker Blocks a channel labels Oct 22, 2021
@dasJ
Copy link
Member Author

dasJ commented Oct 22, 2021

@GrahamcOfBorg test simple switchTest

@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Oct 22, 2021
output += match[1]
status_code = int(match[2])
return (status_code, output)
chunk = self.shell.recv(4096)
output += chunk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, we may also want to do something about the quadratic complexity here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd want some sort of an incremental matcher, but in my limited testing, it seems like it's not really an issue - outputs don't generally exceed 4k, and even when they do, they don't exceed 8k.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a "solution" we could split this into two commands:

f"( set -euo pipefail; {command} ); echo "$?" > /.nixos-test-runner-exit-code\n"
"cat /.nixos-test-runner-exit-code"

This way we can get rid of the entire matching stuff

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, we could just use echo $? as the second command.


while True:
chunk = self.shell.recv(4096).decode(errors="ignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a reason we are no longer ignoring errors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be necessary if we can no longer end up on a character boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's a technicality here: a command could potentially output complete garbage, but in that case I'd much rather have it crash than return something that makes no sense.

@dasJ dasJ mentioned this pull request Oct 23, 2021
12 tasks
@dasJ dasJ removed the 1.severity: channel blocker Blocks a channel label Oct 23, 2021
@dasJ
Copy link
Member Author

dasJ commented Oct 23, 2021

The thing that is blocking the channel is fixed here: #142675, we can refactor execute() later.

@dasJ
Copy link
Member Author

dasJ commented Oct 24, 2021

Closing this since I reimplemented execute() refactor in #142747 now. This time there should not be quadratic complexity because we use a shifting window.

@dasJ dasJ closed this Oct 24, 2021
@dasJ dasJ deleted the fix/test-runner-cleanup branch October 24, 2021 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 6.topic: testing Tooling for automated testing of packages and modules 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants