-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add watch_index
method and ark-cli watch
command
#36
Conversation
Benchmark for 341c426Click to view benchmark
|
It's a good PR, and it seems to be pretty straightforward to complete it, but I'm afraid that merging it before the other index refactorings could make porting ARK-Builders/arklib#72 too difficult. Because we'd need to add one more function to the index, and at the same time we need to check diffs while porting ARK-Builders/arklib#72. |
Benchmark for 0332bd7Click to view benchmark
|
Updated the watch API to call |
Benchmark for dc67cc3Click to view benchmark
|
There appear to be some unexpected events coming from the
This situation requires further investigation. Additionally, we need to test this scenario alongside other |
@tareknaser does only simultaneous deletion cause problems? Does simultaneous addition work fine? |
Benchmark for b8ed0bdClick to view benchmark
|
Yes and yes |
README should be updated to explicitly state in which folder this command should be run:
If |
Benchmark for 4fe7076Click to view benchmark
|
I added a note on how to run the example and mentioned that more can be done with |
Benchmark for 1bbb897Click to view benchmark
|
fs-index/src/watch.rs
Outdated
|
||
let relative_path = file.strip_prefix(&root_path)?; | ||
log::info!("Relative path: {:?}", relative_path); | ||
index.update_one(relative_path)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result should be used to provide user with actual updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we have 2 approaches to choose from:
- Make
update_one
return sameIndexUpdate
type asupdate_all
(simple). Thenwatch_index
would return same type to the user. We could batch updates made in some interval to pack events together, but that's optional (if you find this idea useful, we can create a follow-up task). - Alternatively, we could specialize
update_one
, so Track API and Watch API would become more powerful comparing to Reactive API. The extra power I mean is more finely-grained events, similar to whatnotify-rs
provides: not only add/remove, but also rename/modify. This is more difficult though, and we would need to unify results fromupdate_all
andupdate_one
before returning fromwatch_index
. I suggest creating a follow-up issue for future consideration of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the first approach to get things running faster and keep it simpler for now. I also plan to add tests soon.
Next, I want to add integration tests for this functionality. We could either add integration tests for fs-index
directly or implement CI shell scripts to test ark-cli watch <PATH>
for an end-to-end approach—possibly both?
Do you think CI shell scripts for ark-cli watch
would be sufficient, or should we also include programmatic tests? For example, running the watcher in a separate thread and doing many create/delete operations to verify the results. Now that I think about it, writing these tests programmatically could be complex. What’s your take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest creating a follow-up issue for future consideration of this approach.
tracked in #89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think CI shell scripts for ark-cli watch would be sufficient, or should we also include programmatic tests? For example, running the watcher in a separate thread and doing many create/delete operations to verify the results. Now that I think about it, writing these tests programmatically could be complex. What’s your take?
Agree, I think we can achieve proper result by simple shell script. I imagine it like this:
- Run
ark-cli watch
in background, direct its output to dedicated log file. - Randomly modify folder content and write the performed actions into another log file.
- Then compare the two log files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a shell script integration/ark-cli-watch.sh
to check the sanity of ark-cli watch
. I think it’s pretty cool because it also checks other parts of the code for sanity along the way, like update_one
in this case. Using ark-cli
is definitely a great way to write end-to-end shell scripts to verify what we have.
I’ve also set it up to run in the CI with each push/PR to the main branch. You can check out the expected workflow in my fork here. I’ve been using it for debugging
I spent some time today looking into different ways to use Order is very important in our case because we need events to happen in the right order (for example, we don’t want to see "
While looking through the examples, I also noticed that we might want to use the Debouncer. The file system can sometimes send multiple events for what is really just one change, which could cause problems. For example, it might trigger I'm now testing this in a smaller example and looking at how to set up the event stream properly. |
Benchmark for e4987f5Click to view benchmark
|
Benchmark for a95e06aClick to view benchmark
|
Nitpick, but we can define aliases
Then we could immediately return from
This should be more readable. By the way, it seems that we could simplify From API point of view, we don't need a collection of paths attached to the addition event, only one path (representative). We might need separate events to track duplicates. Something like |
I was actually looking at the code today and thought the same thing. The added field definitely has an interesting type, though I can’t quite remember why we landed on it—probably something inherited from older I think we could track this in a separate issue and handle it in its own PR. What do you think? |
Benchmark for 94fccc9Click to view benchmark
|
fs-index/src/index.rs
Outdated
result.removed.insert(id.item); | ||
} | ||
|
||
result.removed.insert(id.item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some test cases for this.
As well as test cases demonstrating what happens if the caller violates assumptions:
/// **Note**: The caller must ensure that:
/// - The index is up-to-date with the file system except for the updated
/// resource
/// - In case of a addition, the resource was not already in the index
/// - In case of a modification or removal, the resource was already in the
/// index
I'm thinking now that we should panic when the assumptions are violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some test cases for this.
Added a check in the existing test test_track_removal_with_collision
for this particular case.
As well as test cases demonstrating what happens if the caller violates assumptions:
update_one()
is meant to be a simpler, more targeted way to update the index compared to update_all()
. This comes with a few constraints and assumptions that the caller needs to handle. The caller must ensure the index state is as expected before calling update_one()
to avoid issues.
- For example, if
update_one()
is called on a non-existent file, it will panic with an error:
Caller must ensure that the resource exists in the index: "file.txt"
- If the caller mistakenly thinks a file was modified but it wasn’t,
update_one()
will still reindex it with no impact, as the information remains consistent.
Adding checks to enforce conditions—like verifying the index is current with the file system except for the updated resource—would essentially turn update_one()
into update_all()
, as this would require a full reindex. To keep update_one()
efficient, it’s better to leave it as is. We are including a clear warning in the documentation emphasizing that the caller should confirm these conditions. I think this is more than sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding checks to enforce conditions—like verifying the index is current with the file system except for the updated resource—would essentially turn update_one() into update_all(), as this would require a full reindex. To keep update_one() efficient, it’s better to leave it as is.
Yeah, for sure we don't want to rescan the folder during update_one
. We could assert some simple invariants like that the id is present in the index, but this actually already done (implicitly).
We are including a clear warning in the documentation emphasizing that the caller should confirm these conditions. I think this is more than sufficient
Yes, but if we can make it fool-proof, that would be the best.
- I see that if user calls
update_one(removed_path)
and the id of removed file isn't thereunwrap
will panic, ensuring no inconsistent state is introduced. This is good, but even better would be to provide error message. User could forget to callupdate_one
when resource was introduced, but make the call when it was removed. - If user calls
update_one(added_path)
we insert the path intoself.id_to_paths.entry(id.clone()).or_default()
which handles both new addition and duplicate addition (regardless if the path was there or not). This looks good, not need to check anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but even better would be to provide error message. User could forget to call
update_one
when resource was introduced, but make the call when it was removed.
Updated the code to gracefully return an error if this case happens
I'm talking only about code clarity/maintainability. |
We can have multiple paths for same id, and we can have many ids when use
Agree, created issue: |
Benchmark for 4dd461cClick to view benchmark
|
I made a few updates to the PR:
|
Benchmark for 5c4137dClick to view benchmark
|
Benchmark for 4d69b11Click to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, great job
and enhance `update_one()` functionality Signed-off-by: Tarek <[email protected]>
Signed-off-by: Tarek <[email protected]>
…ate index Signed-off-by: Tarek <[email protected]>
Benchmark for 5031b4aClick to view benchmark
|
Description
This pull request adds a new method to
fs-index
crate,watch_index
, to monitor file system changes and automatically update the index.Additionally, it adds a new command to
ark-cli
to make this functionality accessible to users.This change is the first step of addressing issue #21.
Testing
An example of the new method's usage is in the
fs-index
crate atfs-index/examples/index_watch.rs
.To run the example, run the following command:
This command monitors the index at the
test-assets/
directory and automatically updates it upon any file system changes.