[nix] tests are failing/flakey #2103

colemickens · 2023-08-17T14:30:18Z

Description

Hi! Hope all is well.

jj is not building with nix right now, not sure if its a sandbox thing, guessing from the test names?

Steps to Reproduce the Problem

nix build github:martinvonz/jj (or nix build github:martinvonz/jj/7ad2270c0)

Expected Behavior

It builds.

Actual Behavior

It fails when testing:

        FAIL [   0.116s] jj-lib::test_bad_locking test_bad_locking_children::git_backend
        FAIL [   0.120s] jj-lib::test_bad_locking test_bad_locking_children::local_backend

fuller log:

     Running [ 00:00:04] [===> ] 619/952: 14 running, 619 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 620/952: 13 running, 620 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 620/952: 14 running, 620 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 620/952: 15 running, 620 passed, 0 skipp>
     Running [ 00:00:04] [===> ] 620/952: 16 running, 620 passed, 0 skipped     ^M        PASS [   0.018s] jj-lib::test_diff_summary test_matcher_dir_file_transition::local_backend
     Running [ 00:00:04] [===> ] 621/952: 15 running, 621 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 622/952: 14 running, 622 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 622/952: 15 running, 622 passed, 0 skipped     ^M        PASS [   0.049s] jj-lib::test_default_revset_graph_iterator test_>
     Running [ 00:00:04] [===> ] 622/952: 16 running, 622 passed, 0 skipped     ^M     Running [ 00:00:04] [===> ] 623/952: 15 running, 623 passed, 0 skipped     ^M        FAIL [   0.116s] jj-lib::test_bad_locking test_bad_locking_children::git_backend

--- STDOUT:              jj-lib::test_bad_locking test_bad_locking_children::git_backend ---

running 1 test
test test_bad_locking_children::git_backend ... FAILED

failures:

failures:
    test_bad_locking_children::git_backend

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 3 filtered out; finished in 0.11s


--- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::git_backend ---
thread 'test_bad_locking_children::git_backend' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: Uncategorized, message: "Input/output error" }', lib/tests/test_bad_locking.rs:78:55
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

     Running [ 00:00:04] [===> ] 623/952: 16 running, 623 passed, 0 skipped     ^M   Canceling due to test failure: 15 tests still running
     Running [ 00:00:04] [] 624/952: 15 running, 623 passed, 1 failed, 0 skipped     ^M^M        PASS [   0.016s] jj-lib::test_diff_summary test_matcher_normal_cases::local_backend
   Canceling [ 00:00:04] [] 624/952: 15 running, 623 passed, 1 failed, 0 skipped     ^M^M   Canceling [ 00:00:04] [] 625/952: 14 running, 624 passed, 1 failed, 0 skipped     ^M^M   Canceling [ 00:00:04] [] 625/952: 15 running, 624 passed, 1 failed, 0 skipped     ^M^M        PASS [   0.022s] jj-lib::test_diff_summary t>
   Canceling [ 00:00:04] [] 625/952: 16 running, 624 passed, 1 failed, 0 skipped     ^M^M        FAIL [   0.120s] jj-lib::test_bad_locking test_bad_locking_children::local_backend

--- STDOUT:              jj-lib::test_bad_locking test_bad_locking_children::local_backend ---

running 1 test
test test_bad_locking_children::local_backend ... FAILED

failures:

failures:
    test_bad_locking_children::local_backend

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 3 filtered out; finished in 0.11s


--- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::local_backend ---
thread 'test_bad_locking_children::local_backend' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 5, kind: Uncategorized, message: "Input/output error" }', lib/tests/test_bad_locking.rs:48:54
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling [ 00:00:04] [] 626/952: 15 running, 625 passed, 1 failed, 0 skipped     ^M^M        PASS [   0.020s] jj-lib::test_diff_summary test_sorting::git_backend
   Canceling [ 00:00:04] [] 627/952: 14 running, 625 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.020s] jj-lib::test_diff_summary test_sorting::local_backend
   Canceling [ 00:00:04] [] 628/952: 13 running, 626 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.014s] jj-lib::test_diff_summary test_tree_file_transition::local_backend
   Canceling [ 00:00:04] [] 629/952: 12 running, 627 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.045s] jj-lib::test_default_revset_graph_iterator test_graph_iterator_virtual_octopus::skip_transitive_edges
   Canceling [ 00:00:04] [] 630/952: 11 running, 628 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.016s] jj-lib::test_diff_summary test_types::git_backend
   Canceling [ 00:00:04] [] 631/952: 10 running, 629 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.017s] jj-lib::test_diff_summary test_tree_file_transition::git_backend
   Canceling [ 00:00:04] [] 632/952: 9 running, 630 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.018s] jj-lib::test_diff_summary test_types::local_backend
   Canceling [ 00:00:04] [] 633/952: 8 running, 631 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.019s] jj-lib::test_git test_export_partial_failure
   Canceling [ 00:00:04] [] 634/952: 7 running, 632 passed, 2 failed, 0 skipped     ^M^M   Canceling [ 00:00:04] [] 634/952: 7 running, 632 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.024s] jj-lib::test_git test_export_conflicts
   Canceling [ 00:00:04] [] 635/952: 6 running, 633 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.023s] jj-lib::test_git test_export_import_sequence
   Canceling [ 00:00:04] [] 636/952: 5 running, 634 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.032s] jj-lib::test_git test_bulk_update_extra_on_import_refs
   Canceling [ 00:00:04] [] 637/952: 4 running, 635 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.035s] jj-lib::test_git test_concurrent_write_commit
   Canceling [ 00:00:04] [] 638/952: 3 running, 636 passed, 2 failed, 0 skipped     ^M^M        PASS [   0.046s] jj-lib::test_git test_concurrent_read_write_commit
   Canceling [ 00:00:04] [] 639/952: 2 running, 637 passed, 2 failed, 0 skipped     ^M^M   Canceling [ 00:00:04] [] 639/952: 2 running, 637 passed, 2 failed, 0 skipped     ^M^M   Canceling [ 00:00:05] [] 640/952: 1 running, 638 passed, 2 failed, 0 skipped     ^M^M        PASS [   2.715s] jj-cli::test_log_command test_>
   Canceling [ 00:00:05] [] 640/952: 1 running, 638 passed, 2 failed, 0 skipped     ^M^M   Canceling [ 00:00:05] [] 640/952: 1 running, 638 passed, 2 failed, 0 skipped     ^M^M   Canceling [ 00:00:05] [] 640/952: 0 running, 639 passed, 2 failed, 0 skipped     ^M^M------------
     Summary [   5.053s] 641/952 tests run: 639 passed, 2 failed, 0 skipped
        FAIL [   0.116s] jj-lib::test_bad_locking test_bad_locking_children::git_backend
        FAIL [   0.120s] jj-lib::test_bad_locking test_bad_locking_children::local_backend
   Canceling [ 00:00:05] [] 641/952: 0 running, 639 passed, 2 failed, 0 skipped     ^M^Merror: test run failed

Specifications

╭ zeph  ~/code/nixcfg 11sec 2ms
╰─▶ cat /etc/os-release
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="23.11.20230816.91ee80b"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 23.11 (Tapir)"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="23.11 (Tapir)"
VERSION_CODENAME=tapir
VERSION_ID="23.11"

(Though this shouldn't matter, I don't override jj's nixpkgs)

The text was updated successfully, but these errors were encountered:

colemickens · 2023-08-17T14:31:03Z

note, while I was leaving open the chance for flakiness, I haven't actually seen these be flakey, just consistently failing.

colemickens · 2023-08-17T14:47:07Z

Odd... pinning a few older revisions isn't helping, and then the blame via GitHub UI doesn't indicate changes to that test function recently.

necauqua · 2023-08-17T17:19:45Z

This is odd, I do nix profile upgrade every few days and it's fine, and right now for me it succeeded too (and then nix build just immediately links it to result)

Looking at it being the classic generic "Input/output error" from the OS, maybe you have some exotic/broken filesystem or something?.

Basically the sandbox didn't work 😄

colemickens · 2023-08-17T17:32:15Z

Yeah, it's bizarre. I've tried it on two different machines now, albeit they're practically identical.

My root is zfs, my tmp is tmpfs. Not sure... I might file a nixpkgs bug.

colemickens · 2023-08-20T05:01:43Z

In the issue I filed on Nix, someone else said they were able to build. I'm building it in a loop.... sometimes only one test fails, but I haven't had a pass yet (even though it should be building on a single core with the settings I'm using).

colemickens · 2023-08-20T05:33:40Z

I think the tests are flakey. I finally got it to build:

nu─▶  $env.NIX_BUILD_CORES = 1
nu─▶  nix build -L github:martinvonz/jj --option 'cores' '1' -j1

Finally built it.

martinvonz · 2023-08-20T05:57:19Z

The tests do a recursive file system copy of some repos they create. They do that by walking the file system. Could it be that you have something running in the background that notices these temporary directories and adds a temporary file in them, which the test might then fail to copy if the background process had already deleted it.

colemickens · 2023-08-20T06:30:22Z

I'm not sure why changing it to be single-threaded would be the issue, if the implication is it's something outside the build itself.
Nix should be sandboxing this, such that it's being built in its own private /tmp/ directory.
I do use this, which uses FB's watchman: https://github.com/jgavris/rs-git-fsmonitor but I'm not really sure that it writes files into its watched dirs/repos... or again, why it would magically go away as soon as I run a single-threaded build.

Not to hammer the point too hard, but I had dozens of failures before setting it to a single core, and then it immediately built.

martinvonz · 2023-08-20T06:59:26Z

Since you're able to reproduce it quite reliably, could you try applying this and see what it reports? I'm curious to see if it's a particular path every time.

diff --git a/lib/tests/test_bad_locking.rs b/lib/tests/test_bad_locking.rs
index 68ac86ebc9...0f31ee743a 100644
--- a/lib/tests/test_bad_locking.rs
+++ b/lib/tests/test_bad_locking.rs
@@ -45,7 +45,9 @@
         if child_left.is_dir() {
             sub_dirs.push(base_name.to_os_string());
         } else {
-            std::fs::copy(&child_left, child_output).unwrap();
+            if let Some(err) = std::fs::copy(&child_left, &child_output).err() {
+                panic!("failed to copy {child_left:?} to {child_output:?}: {err}");
+            }
         }
     }
     // Walk the base and find files removed in the right side, then remove them in
@@ -75,7 +77,9 @@
         } else if !child_base.exists() {
             // This overwrites the left side if that's been written. That's fine, since the
             // point of the test is that it should be okay for either side to win.
-            std::fs::copy(&child_right, child_output).unwrap();
+            if let Some(err) = std::fs::copy(&child_right, &child_output).err() {
+                panic!("failed to copy {child_right:?} to {child_output:?}: {err}");
+            }
         }
     }
     // Do the merge in subdirectories

colemickens · 2023-08-20T16:03:20Z

Here's what I'm testing:
colemickens@3f732df

nix build github:colemickens/jj/3f732df20a76946b55f8bd5480923e598f314a7a -L [--rebuild]

(so default multi-core-ish behavior)

run 1

jujutsu-unstable> --- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::git_backend ---                                                              
jujutsu-unstable> thread 'test_bad_locking_children::git_backend' panicked at 'failed to copy "/build/jj-test-4LYqZV/.jj/repo/index/operations/8165661ec2f17a967a621aa4b465a
fda6bf5335ba44624094ee9d71ee84e07fb587a62b343de6cd1ed7636b1de73e23ae7e2f78f796914b35cc1444f3d3fea80" to "/build/jj-test-V0CQnr/.jj/repo/index/operations/8165661ec2f17a967a6
21aa4b465afda6bf5335ba44624094ee9d71ee84e07fb587a62b343de6cd1ed7636b1de73e23ae7e2f78f796914b35cc1444f3d3fea80": Input/output error (os error 5)', lib/tests/test_bad_locking
.rs:49:17

jujutsu-unstable> --- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::local_backend ---                                                                                                                       
jujutsu-unstable> thread 'test_bad_locking_children::local_backend' panicked at 'failed to copy "/build/jj-test-CVjNHq/.jj/repo/store/files/9063990e5c5b2184877f92adace7c801a549b00c39cd7549877f06d5dd0d3a6ca6eee42d5896bdac64831c8114c
55cee664078bd105dc691270c92644ccb2ce7" to "/build/jj-test-Qp4aFE/.jj/repo/store/files/9063990e5c5b2184877f92adace7c801a549b00c39cd7549877f06d5dd0d3a6ca6eee42d5896bdac64831c8114c55cee664078bd105dc691270c92644ccb2ce7": Input/output e
rror (os error 5)', lib/tests/test_bad_locking.rs:49:17

run 2: ... of course it passes....
run 3: with --rebuild...

jujutsu-unstable> --- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::local_backend ---
jujutsu-unstable> thread 'test_bad_locking_children::local_backend' panicked at 'failed to copy "/build/jj-test-VjtkfG/.jj/repo/store/trees/ec27fb5fc356eeeeb77ed78bc8e425ce0648796fb272b83aee088e143d60fdc077bfc884c1119ef69e90822f9b6d7f8203a2142c04b3af2549c0636e7d0d621d" to "/build/jj-test-5xomD8/.jj/repo/store/trees/ec27fb5fc356eeeeb77ed78bc8e425ce0648796fb272b83aee088e143d60fdc077bfc884c1119ef69e90822f9b6d7f8203a2142c04b3af2549c0636e7d0d621d": Input/output error (os error 5)', lib/tests/test_bad_locking.rs:49:17


jujutsu-unstable> --- STDERR:              jj-lib::test_bad_locking test_bad_locking_children::git_backend ---
jujutsu-unstable> thread 'test_bad_locking_children::git_backend' panicked at 'failed to copy "/build/jj-test-QlkzzQ/.jj/repo/op_store/views/ce408fc4a25805be3351cf04e48117d4b7549355d10d3d310cf0452142b2770cf2170fcbbd6419e6b92726369e87b9e0a59c1e506eaa1375e20f8259813849ab" to "/build/jj-test-btsajD/.jj/repo/op_store/views/ce408fc4a25805be3351cf04e48117d4b7549355d10d3d310cf0452142b2770cf2170fcbbd6419e6b92726369e87b9e0a59c1e506eaa1375e20f8259813849ab": Input/output error (os error 5)', lib/tests/test_bad_locking.rs:81:17

Let me know if you want me to script this to get pass/fail rate, or make a log of the fail cases, or if this is sufficient.

martinvonz · 2023-08-20T16:17:40Z

I think that's sufficient, thanks. I still have no idea why it would sometimes fail to copy some files, however. If you're curious enough, I suppose you could reduce the test case to just manually build up some random directories and see if running the same merge_directories() function on them also fails.

I ran into some issues here when switching our tests to use `.jj`-internal git repos. For example, the `std::fs::copy()` calls started failing, which may be related to #2103. I think one problem is that we could end up calling `merge_directories()` twice for the same directory. This patch fixes that by deduping the paths we call with, and makes the function assume that the output directory doesn't exist.

I ran into some issues here when switching our tests to use `.jj`-internal git repos. For example, the `std::fs::copy()` calls started failing, which may be related to jj-vcs#2103. I think one problem is that we could end up calling `merge_directories()` twice for the same directory. This patch fixes that by deduping the paths we call with, and makes the function assume that the output directory doesn't exist.

colemickens mentioned this issue Aug 17, 2023

build fails on my machines, but not [for] others: martinvonz/jj NixOS/nix#8843

Closed

martinvonz mentioned this issue Sep 17, 2023

test_bad_locking: make merge_directories() expect non-existent target #2262

Merged

4 tasks

martinvonz mentioned this issue Nov 8, 2023

Panic while executing jj log #2549

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nix] tests are failing/flakey #2103

[nix] tests are failing/flakey #2103

colemickens commented Aug 17, 2023

colemickens commented Aug 17, 2023

colemickens commented Aug 17, 2023

necauqua commented Aug 17, 2023 •

edited

Loading

colemickens commented Aug 17, 2023

colemickens commented Aug 20, 2023

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023 •

edited

Loading

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023

[nix] tests are failing/flakey #2103

[nix] tests are failing/flakey #2103

Comments

colemickens commented Aug 17, 2023

Description

Steps to Reproduce the Problem

Expected Behavior

Actual Behavior

Specifications

colemickens commented Aug 17, 2023

colemickens commented Aug 17, 2023

necauqua commented Aug 17, 2023 • edited Loading

colemickens commented Aug 17, 2023

colemickens commented Aug 20, 2023

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023 • edited Loading

colemickens commented Aug 20, 2023

martinvonz commented Aug 20, 2023

necauqua commented Aug 17, 2023 •

edited

Loading

martinvonz commented Aug 20, 2023 •

edited

Loading