Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing events when file is created too soon after watcher is initialized #87

Open
J-N-K opened this issue Jan 4, 2023 · 3 comments
Open

Comments

@J-N-K
Copy link

J-N-K commented Jan 4, 2023

I came across a very nasty issue while working on openhab/openhab-core#3004 and unfortunately I don't have an easy test-case for that (I can't reproduce it locally, I only see it in GitHub CI builds, so I assume it's a timing issue on slower machines).

What we essentially do is to create a WatchService with the following code:

            dirWatcher = DirectoryWatcher.builder().listener(this).path(basePath).build();
            watchThread = new Thread(dirWatcher::watch, name);
            watchThread.start();

We then inject this WatchService into other OSGi services and they get notified about changes in the watched directory. We have integration tests that check if one of these services (FolderObserver) is correctly processing created and changed files:

https://github.com/openhab/openhab-core/blob/51d6b880ba6d40194dd731af80f32fa6ad7e22e7/itests/org.openhab.core.model.core.tests/src/main/java/org/openhab/core/model/core/internal/folder/FolderObserverTest.java

These tests failed quite often in CI builds because we did not receive the created event for files that have been created in the watched directory after the watch service was initialized. Adding Thread.sleep(1000) after the creation of the watch service immediately solved these issues, so I believe it's probably an issue with the initial hashing/initialization, even if I didn't find an obvious reason for thus behavior.

As I said: I'm unable to provide a simple test case for that but maybe someone comes around and has the same issue...

@gmethvin
Copy link
Owner

gmethvin commented Jan 4, 2023

Thanks for the report @J-N-K. I'm guessing that's actually because when you call watch() it still needs to traverse the directory to get initial hashes of all the files before it can start watching, and there's no way to know when it finishes the initialization phase and starts the actual listening phase.

You may be able to use the watchAsync method instead, which actually does the initialization synchronously and returns a future as soon as the watcher starts watching. I think something like this would be roughly equivalent to the code you have:

dirWatcher = DirectoryWatcher.builder().listener(this).path(basePath).build();
executor = Executors.newSingleThreadExecutor();
dirWatcher.watchAsync(executor);

then you should be able to shut it down with:

dirWatcher.close();
executor.shutdown();

Of course that has the downside of having to block during the initialization phase, but you could wrap that in another future perhaps and provide some async method of seeing when it starts watching. You might also consider disabling hashing, which makes this initialization phase much faster, though it increases the likelihood of duplicate events.

Nevertheless I think it's an issue that watch() provides no way of knowing when the initialization phase ends and the actual watching starts. We'd have to think about the best way to communicate this. Perhaps it'd make sense to provide a signal to the listener in that case. Let me know if you have any ideas on the best approach here.

@J-N-K
Copy link
Author

J-N-K commented Jan 5, 2023

I have refactored a bit to use .watchAsync (which was not so easy, because we can't block in service activation) and it seems to avoid these issues. Unfortunately I don't have good idea how to signal that.

@gmethvin
Copy link
Owner

gmethvin commented Jan 9, 2023

I can't think of any way around this problem if you want hashing. I guess we could start watching while we're doing the initial hashing of the files, then just emit the events as-is if we haven't hashed the file yet. Does your use case require the hashing functionality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants