Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_vfs_works failed on Mac OS X #734

Closed
gfreezy opened this issue Feb 3, 2019 · 23 comments
Closed

test_vfs_works failed on Mac OS X #734

gfreezy opened this issue Feb 3, 2019 · 23 comments
Labels
E-has-instructions Issue has some instructions and pointers to code to get started E-medium good first issue

Comments

@gfreezy
Copy link
Contributor

gfreezy commented Feb 3, 2019

image

cargo 1.32.0 (8610973aa 2019-01-02)

@pnkfelix
Copy link
Member

pnkfelix commented Feb 12, 2019

I looked at this with @matklad today via zulip. Based on instrumentation I added as suggested by @matklad , we determined that OS X seems to generate a DebouncedEvent::Create in response to the fs::write call, rather than a Write as we had been expecting.

@matklad is planning to fix this on the side of rust-analyzer very soon (perhaps by folding together ChangeKind::Create and ChangeKind::Write, or at least handle them the same way more often in the VFS).

@pnkfelix
Copy link
Member

More info: further investigation revealed that the tempdir library is creating directories hosted under /var/, which in OS X is a symbolic link to /private/var/. So we get into a situation where the VFS internal database is trying to look for files that are rooted at /var/temp/dir/stuff/here/a/b/, but the OS X notifications are referencing flies that are canonically stored at /private/var/temp/dir/stuff/here/a/b/. And thus the VFS drops the notifications.

We are looking into it further now.

@pnkfelix pnkfelix changed the title test_vfs_works failed on macos 10.14.2 test_vfs_works failed on Mac OS X Feb 12, 2019
@pnkfelix
Copy link
Member

(BTW this is not specific to OS X version 10.14.2; I am using 10.13 and I believe this /private/ directory structure has been around for a long long time before this...)

@matklad
Copy link
Member

matklad commented Feb 12, 2019

Suggested fix: change RootConfig to

pub(crate) struct RootConfig {
    root: PathBuf,
    // result of root.canonicalize() if it is different from root
    canonical_root: Option<PathBuf>,
    excluded_dirs: Vec<PathBuf>,
}

and use both root and canonical_root in the contains method.

I did a quick skim over other usages of root, and it looks like it's the only place where we need to account for both original and canonical roots.

@matklad matklad added good first issue E-medium E-has-instructions Issue has some instructions and pointers to code to get started and removed E-hard labels Feb 12, 2019
@matklad
Copy link
Member

matklad commented Feb 12, 2019

Hm, thinking about it more, I think we should change excluded_dirs: Vec<PathBuf>, to be excluded_dirs: Vec<RelativePathBuf>, as well.

@matklad
Copy link
Member

matklad commented Feb 12, 2019

Q: why store two paths instead of just always canonicalizing the file?

A: we might be unable to canonicalize paths coming from the client of the language server! Consider the case where the client starts the server in a symlinked dir and then sends a "didOpenTextDocuemnt" request for a not-yet-saved document, using this non-canonical symlinked path. We would be unable to obtain canonical path here, because the file does not exists.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 13, 2019

After implementing the experiment above (to store both the root and canonical_root and compare against both in the contains method, I now get further into the same test before it fails:

thread 'test_vfs_works' panicked at 'Expected [VfsChange::RemoveFile { path: removed_path, .. }, VfsChange::AddFile {
 text, path: added_path, .. }], got [AddFile { root: VfsRoot(1), file: VfsFile(5), path: "sub1/sub2/new1.rs", text: "new hello" }]', crates/ra_vfs/tests/vfs.rs:129:5

which corresponds to this code (I've added some dbg! instrumentation disrupting the line numbers):

https://github.com/rust-analyzer/rust-analyzer/blob/74d03d57e77b791b6973497f17a35136ddc295ab/crates/ra_vfs/tests/vfs.rs#L122-L130


However, there is still a Timeout occurring at some point prior to the panic (I replaced the unwrap here with something that just prints the result and returns in the error case, so obviously the test failure above is probably just masking the actual bug -- namely the timeout that probably shouldn't be happening.

@pnkfelix
Copy link
Member

... so far, regarding the failure described in the previous comment, my experiments seem to indicate that we might not be getting a RemoveFile event on Mac OS X in response to the rename? That is my current best guess, but I'm not sure how to go about confirming it (or even sidestepping it).

@matklad
Copy link
Member

matklad commented Feb 13, 2019

I'm not sure how to go about confirming it

I think that dbg! in the convert_notify_event function should give a pretty reliable picture. If it is indeed the case that the notify library does not send any event to us, it might be a good idea to make a standalone reproduction of this. Have you also tried adding timeouts in this case? :)

After implementing the experiment above

Could you send a PR with this? It definitely fixes at least one bug, and is a clear improvement.

@pnkfelix
Copy link
Member

yeah I've tried adding timeouts as well as file.sync_contents() and file.sync_all() calls to try to force the operation to complete before we process to the process_tasks() call. No luck yet.

@matklad
Copy link
Member

matklad commented Feb 13, 2019

I wonder if we should setup testing on MacOS...

My hope is to keep most of the rust-analyzer 100% platform-independent, such that there's no need to test on more than one platform (which is great for CI times!).

However we need to touch OS somewhere. Currently, this is isolated to ra_vfs, ra_project_model and ra_lsp_server crates. Ideally, we should test only this three things on all platforms.

EDIT: splitting of a separate issue
EDIT: #819

@pnkfelix
Copy link
Member

pnkfelix commented Feb 13, 2019

After implementing the experiment above

Could you send a PR with this? It definitely fixes at least one bug, and is a clear improvement.

posted in PR #818

(I did not attempt to make a regression test though. I might imagine ways could attempt to do so, but it seems like a bit of a OS-dependent rabbit hole.)

@pnkfelix
Copy link
Member

pnkfelix commented Feb 13, 2019

I'm not sure how to go about confirming it

I think that dbg! in the convert_notify_event function should give a pretty reliable picture. If it is indeed the case that the notify library does not send any event to us, it might be a good idea to make a standalone reproduction of this.

If you want to take a look, I posted the results of a log with my current instrumentation (as well as the diff for said instrumentation, atop PR #818) at the following gist:

https://gist.github.com/pnkfelix/985658b6d4b610cd72f435f952bdb53f

(this particular version has the sleep calls commented out, but the results were much the same when I had them in. In some scenarios I did notice glitches in the earlier "quux" test, where the event would have the text of the old version of the file, not the new one. But I have not yet determined exactly why that arose, non-deterministically it seemed.)

@matklad
Copy link
Member

matklad commented Feb 13, 2019

I think the interesting part of the log is

[crates/ra_vfs/src/io.rs:170] ("convert_notify_event", &event) = (
    "convert_notify_event",
    Create(
        "/private/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/b/baz.rs"
    )
)
[crates/ra_vfs/src/io.rs:170] ("convert_notify_event", &event) = (
    "convert_notify_event",
    Create(
        "/private/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/b/baz.rs"
    )
)
[crates/ra_vfs/src/io.rs:170] ("convert_notify_event", &event) = (
    "convert_notify_event",
    Create(
        "/private/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/sub1"
    )
)
[crates/ra_vfs/src/io.rs:170] ("convert_notify_event", &event) = (
    "convert_notify_event",
    Create(
        "/private/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/sub1/sub2/new1.rs"
    )
)
[crates/ra_vfs/tests/vfs.rs:44] ("fs_rename from:", _p, "to:", &to) = (
    "fs_rename from:",
    File {
        fd: 3,
        path: "/private/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/sub1/sub2/new.rs",
        read: true,
        write: false
    },
    "to:",
    "/var/folders/68/bxt927zn2tnbpj7qf4_4vpk80000gn/T/.tmpe8AoEX/a/sub1/sub2/new1.rs"
)

note that last too entries should probably be swapped (this is ok, they are from different threads).

My understanding is that, when we crate a/sub1/sub2/new.rs file, we correctly get event for a/sub1 folder, and then correctly watch it recursively. Note that we don't get an event for a/sub1/sub2/new.rs itself, but this is expected, as we only watch a folder. Because we rescan folders on the Create event, new.rs is correctly added to the vfs. During this rescan we also set watchers for sub1, sub2 and new.rs

When we rename, we do get evet for new1.rs (which means that we have set up watching correctly in the previous step), but we miss event for new.rs and ... I don't have any idea as to why that happens.

cc @vemoo

@vipentti
Copy link
Contributor

Not sure if related to this issue, but currently test_vfs_works fails on Windows as well.

Output:

λ cargo test -p ra_vfs --test vfs
    Finished dev [unoptimized + debuginfo] target(s) in 0.71s
     Running target\debug\deps\vfs-a65779a63b672f90.exe

running 1 test
test test_vfs_works ... FAILED

failures:

---- test_vfs_works stdout ----
thread 'test_vfs_works' panicked at 'Expected [VfsChange::RemoveFile { path: removed_path, .. }, VfsChange::AddFile {
 text, path: added_path, .. }], got [RemoveFile { root: VfsRoot(1), file: VfsFile(4), path: "sub1/sub2/new.rs" }]', crates\ra_vfs\tests\vfs.rs:122:5
stack backtrace:
   0: std::sys::windows::backtrace::set_frames
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys\windows\backtrace\mod.rs:104
   1: std::sys::windows::backtrace::set_frames
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys\windows\backtrace\mod.rs:104
   2: std::sys::windows::backtrace::set_frames
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys\windows\backtrace\mod.rs:104
   3: std::sys_common::backtrace::print
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys_common\backtrace.rs:59
   4: std::sys_common::backtrace::print
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\sys_common\backtrace.rs:59
   5: std::panicking::default_hook
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:221
   6: std::panicking::rust_panic_with_hook
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:491
   7: std::panicking::continue_panic_fmt
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:398
   8: std::panicking::begin_panic_fmt
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libstd\panicking.rs:353
   9: vfs::test_vfs_works
             at .\tests\vfs.rs:122
  10: vfs::test_vfs_works::{{closure}}
             at .\tests\vfs.rs:29
  11: core::ops::function::FnOnce::call_once<closure,()>
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libcore\ops\function.rs:238
  12: test::run_test::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libtest\lib.rs:1459
  13: test::run_test::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libtest\lib.rs:1459
  14: test::run_test::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libtest\lib.rs:1459
  15: panic_unwind::__rust_maybe_catch_panic
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libpanic_unwind\lib.rs:102
  16: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  17: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  18: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  19: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  20: std::thread::{{impl}}::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\thread\mod.rs:477
  21: std::thread::{{impl}}::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\thread\mod.rs:477
  22: std::thread::{{impl}}::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\thread\mod.rs:477
  23: panic_unwind::__rust_maybe_catch_panic
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\/src\libpanic_unwind\lib.rs:102
  24: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  25: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  26: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  27: std::panicking::try
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\libstd\panicking.rs:289
  28: alloc::boxed::{{impl}}::call_once
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\boxed.rs:683
  29: alloc::boxed::{{impl}}::call_once
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\boxed.rs:683
  30: alloc::boxed::{{impl}}::call_once
             at /rustc/9fda7c2237db910e41d6a712e9a2139b352e558b\src\liballoc\boxed.rs:683
  31: BaseThreadInitThunk
  32: RtlUserThreadStart


failures:
    test_vfs_works

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '-p ra_vfs --test vfs'

@vemoo
Copy link
Contributor

vemoo commented Feb 13, 2019

But the event that we recieve from notify should be a rename event.
This is what I get on linux:

[crates/ra_vfs/src/io.rs:170] ("convert_notify_event", &event) = (
    "convert_notify_event",
    Rename(
        "/tmp/.tmpUHltrM/a/sub1/sub2/new.rs",
        "/tmp/.tmpUHltrM/a/sub1/sub2/new1.rs"
    )
)

@pnkfelix can you try to run the test in a normal folder (not in /var) by changing
https://github.com/rust-analyzer/rust-analyzer/blob/74d03d57e77b791b6973497f17a35136ddc295ab/crates/ra_vfs/tests/vfs.rs#L34

@kjeremy
Copy link
Contributor

kjeremy commented Feb 13, 2019

bors bot added a commit that referenced this issue Feb 13, 2019
818: In `RootConfig::contains`, check against canonicalized version of root path r=matklad a=pnkfelix

In `RootConfig::contains`, check against canonicalized version of root path since OS may hand us data that uses the canonical form rather than the root as specified by the user.

This is a step towards a resolution of issue #734 but does not completely fix the problem there.

Co-authored-by: Felix S. Klock II <[email protected]>
@pnkfelix
Copy link
Member

@vemoo

@pnkfelix can you try to run the test in a normal folder (not in /var) by changing
https://github.com/rust-analyzer/rust-analyzer/blob/74d03d57e77b791b6973497f17a35136ddc295ab/crates/ra_vfs/tests/vfs.rs#L34

yeah I just tried that by making the temp dir be just my home directory (in canonical form), and I still don't see a rename event; only a create of new1.rs

@pnkfelix
Copy link
Member

(but at this point I'm thinking the right next step is to reduce this to an issue with the notify crate, and file a bug there.)

@pnkfelix
Copy link
Member

@dfaust points out here that fsevents can merge events within a 30 second buffer.

That could explain why e.g. I am seeing remove events for bar in notify-rs/notify#181 even though it was just the target of a rename: on the test posted in that issue, within that 30s window, I had also removed the temporary directory structure that held the previous run of the test.

Possible work-arounds for this we might adopt in our own test suite:

  1. Add 30second delays in various places in the vfs.rs tests, namely after setting up the initial state we want to start from.
  2. Be more aggressive in assuring that different tests do not use the same paths; e.g. create more temporary directories.

@pnkfelix
Copy link
Member

see also relevant comments on this thread regarding rename handling being a mess on OS X: notify-rs/notify#89 (comment)

@matklad
Copy link
Member

matklad commented Feb 14, 2019

I wouldn't want to add huge delays: it's important that cargo test --all is fast on all OSes. So, if 2. works, I personally would prefer it.

@pnkfelix
Copy link
Member

pnkfelix commented Feb 15, 2019

Okay, update on current situation:

  • I don't think adding more temporary directories will solve testing on Mac OS X, at least not if we preserve the existing tests. Some of our tests, I think quite deliberately, are trying to recreate file system operations that will yield raw-events with combined actions that can mess up notify's current debouncing and also just look confusing from our side.
  • In my experiments, adding 35 second delays in the right spots can resolve almost all of my testing issues on OS X: but even with 35 second delays peppered in various places, there does still remain Renaming on OS X yields raw-events that confuse debouncer notify-rs/notify#181
    • Update: Okay, I admit that maybe I might have missed some spot that maybe I could have inserted a 35 second delay to also address Renaming on OS X yields raw-events that confuse debouncer notify-rs/notify#181. I don't know. It doesn't seem worth dissecting.
    • Hypothetically one might be able to revise vfs.rs to work with raw-events rather than debounced-events as a way to sidestep debouncing issues and ensure that we actually observe all the files that are touched. But I'm not going to take on that work.
    • And of course, 35 second delays really do yield a terrible testing experience.

So now I have another approach to suggest, that is more along the lines that was used to resolve #830: change the test suite to allow more flexible specification of the number of events we expect to receive on the task_receiver. I'll have a PR up shortly showing what I mean.

bors bot added a commit that referenced this issue Feb 15, 2019
837: Generalize vfs.rs test to address fsevents timing-dependent behavior. r=matklad a=pnkfelix

Generalize `tests/vfs.rs` processing to address wildly-varying timing-dependent behavior from fsevents (i.e. on Mac OS X).

Fix #734 

Co-authored-by: Felix S. Klock II <[email protected]>
@bors bors bot closed this as completed in #837 Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-has-instructions Issue has some instructions and pointers to code to get started E-medium good first issue
Projects
None yet
Development

No branches or pull requests

6 participants