Add `watch_index` method and `ark-cli watch` command #36

tareknaser · 2024-04-22T00:54:59Z

Description

This pull request adds a new method to fs-index crate, watch_index, to monitor file system changes and automatically update the index.
Additionally, it adds a new command to ark-cli to make this functionality accessible to users.
This change is the first step of addressing issue #21.

Testing

An example of the new method's usage is in the fs-index crate at fs-index/examples/index_watch.rs.
To run the example, run the following command:

cargo run --example index_watch

This command monitors the index at the test-assets/ directory and automatically updates it upon any file system changes.

github-actions · 2024-04-22T01:00:41Z

Benchmark for `341c426`

Click to view benchmark

Test	Base	PR	%
../test-assets/lena.jpg/compute_bytes	13.6±0.51µs	13.3±0.09µs	-2.21%
../test-assets/test.pdf/compute_bytes	139.0±2.61µs	107.6±0.80µs	-22.59%
compute_bytes_large/compute_bytes	467.9±9.08µs	139.9±1.85µs	-70.10%
compute_bytes_medium/compute_bytes	26.8±0.25µs	27.7±0.79µs	+3.36%
compute_bytes_small/compute_bytes	127.2±1.07ns	128.0±6.04ns	+0.63%
index_build/index_build/../test-assets/	161.3±5.81µs	160.5±1.53µs	-0.50%

fs-index/examples/index_watch.rs

fs-index/src/watch.rs

kirillt · 2024-05-11T14:32:56Z

It's a good PR, and it seems to be pretty straightforward to complete it, but I'm afraid that merging it before the other index refactorings could make porting ARK-Builders/arklib#72 too difficult. Because we'd need to add one more function to the index, and at the same time we need to check diffs while porting ARK-Builders/arklib#72.

github-actions · 2024-05-12T06:20:04Z

Benchmark for `0332bd7`

Click to view benchmark

Test	Base	PR	%
../test-assets/lena.jpg/compute_bytes	13.3±0.16µs	13.3±0.08µs	0.00%
../test-assets/test.pdf/compute_bytes	109.8±2.17µs	111.6±0.55µs	+1.64%
compute_bytes_large/compute_bytes	471.0±0.78µs	139.5±3.27µs	-70.38%
compute_bytes_medium/compute_bytes	30.9±0.20µs	27.7±0.21µs	-10.36%
compute_bytes_small/compute_bytes	127.8±1.66ns	128.2±3.58ns	+0.31%
index_build/index_build/../test-assets/	163.0±1.45µs	161.0±0.53µs	-1.23%

tareknaser · 2024-09-09T09:23:15Z

Updated the watch API to call ResourceIndex::update_one() for files created, removed, or modified based on streams from notify events.

github-actions · 2024-09-09T09:41:05Z

Benchmark for `dc67cc3`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.7±1.19µs	247.7±0.87µs	-0.80%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.06µs	15.6±0.08µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1350.7±8.76ns	1358.8±6.23ns	+0.60%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.2±0.54µs	197.0±0.67µs	-0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1757.1±7.55µs	1763.0±13.37µs	+0.34%
crc32_resource_id_creation/compute_from_bytes:large	86.7±0.24µs	86.8±0.34µs	+0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.01µs	5.4±0.03µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.55ns	92.4±0.33ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.8±0.29µs	64.8±0.86µs	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	946.9±3.53µs	949.6±4.11µs	+0.29%
resource_index/index_build//tmp/ark-fs-index-benchmarks94k72W	106.6±3.35ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksYCHMXF	105.0±2.17ms	N/A	N/A
resource_index/index_get_resource_by_id	97.1±0.25ns	99.2±0.37ns	+2.16%
resource_index/index_get_resource_by_path	52.8±0.26ns	55.1±0.33ns	+4.36%
resource_index/index_update_all	1135.9±41.95ms	1137.9±59.79ms	+0.18%
resource_index/index_update_one	684.1±33.46ms	693.3±33.36ms	+1.34%

tareknaser · 2024-09-09T11:10:16Z

There appear to be some unexpected events coming from the notify stream. For example, I've identified a potential flaw with the following steps:

Run the watch API on a folder.
Copy a file multiple times (e.g., file copy.txt, file copy 2.txt).
Up until this point, the index updates correctly.
Delete both files simultaneously (select and delete them together).
This results in a panic in ResourceIndex::update_one().

This situation requires further investigation. Additionally, we need to test this scenario alongside other ResourceIndex tests to be implemented for #88.

kirillt · 2024-09-09T11:27:31Z

@tareknaser does only simultaneous deletion cause problems? Does simultaneous addition work fine?

github-actions · 2024-09-09T11:30:57Z

Benchmark for `b8ed0bd`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	250.8±0.85µs	249.0±1.70µs	-0.72%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.03µs	15.6±0.04µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1357.4±3.61ns	1363.3±8.30ns	+0.43%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.8±2.52µs	197.6±0.65µs	-0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1762.4±4.96µs	1768.8±36.65µs	+0.36%
crc32_resource_id_creation/compute_from_bytes:large	86.9±0.69µs	86.9±0.42µs	0.00%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.01µs	5.4±0.02µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.70ns	92.7±1.67ns	+0.32%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.5±0.27µs	64.9±1.47µs	+0.62%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	945.7±4.82µs	946.3±5.29µs	+0.06%
resource_index/index_build//tmp/ark-fs-index-benchmarks61KWbS	106.6±1.98ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksLAAoc8	111.8±0.74ms	N/A	N/A
resource_index/index_get_resource_by_id	97.1±0.37ns	96.7±0.50ns	-0.41%
resource_index/index_get_resource_by_path	52.6±0.24ns	52.7±0.25ns	+0.19%
resource_index/index_update_all	1089.8±34.10ms	1115.0±32.55ms	+2.31%
resource_index/index_update_one	669.3±24.95ms	660.4±22.62ms	-1.33%

fs-index/src/watch.rs

tareknaser · 2024-09-09T11:49:09Z

does only simultaneous deletion cause problems? Does simultaneous addition work fine?

Yes and yes
Even simultaneous deletion work fine in some cases but i was able to reproduce the error more than once

kirillt · 2024-09-09T11:55:37Z

README should be updated to explicitly state in which folder this command should be run:

cargo run --example resource_index

If ark-cli can handle similar scenario, it should be mentioned in the README, too.

github-actions · 2024-09-09T12:12:17Z

Benchmark for `4fe7076`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.4±1.85µs	250.4±3.74µs	+0.81%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.15µs	16.9±0.17µs	+8.33%
blake3_resource_id_creation/compute_from_bytes:small	1360.0±4.45ns	1356.6±5.93ns	-0.25%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.4±1.32µs	197.7±1.08µs	+0.15%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1757.9±10.11µs	1769.9±21.12µs	+0.68%
crc32_resource_id_creation/compute_from_bytes:large	87.0±0.89µs	86.8±0.62µs	-0.23%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.06µs	5.4±0.09µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.6±1.26ns	92.6±1.39ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	65.0±0.47µs	65.0±0.57µs	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	953.0±24.12µs	967.2±2.47µs	+1.49%
resource_index/index_build//tmp/ark-fs-index-benchmarks0HA7fz	106.8±2.43ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksf1Yq2s	112.6±1.21ms	N/A	N/A
resource_index/index_get_resource_by_id	97.4±0.67ns	94.9±1.11ns	-2.57%
resource_index/index_get_resource_by_path	52.9±0.60ns	50.2±0.26ns	-5.10%
resource_index/index_update_all	1091.9±36.89ms	1118.4±43.34ms	+2.43%
resource_index/index_update_one	653.8±22.83ms	668.3±19.57ms	+2.22%

fs-index/src/watch.rs

tareknaser · 2024-09-10T11:03:13Z

README should be updated to explicitly state in which folder this command should be run:

I added a note on how to run the example and mentioned that more can be done with ark-cli watch.

github-actions · 2024-09-10T11:23:49Z

Benchmark for `1bbb897`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.6±1.44µs	249.2±1.63µs	-0.16%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.06µs	15.5±0.06µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1362.9±7.31ns	1357.6±7.05ns	-0.39%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.5±0.41µs	197.5±0.85µs	0.00%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1760.6±8.18µs	1770.1±29.87µs	+0.54%
crc32_resource_id_creation/compute_from_bytes:large	86.6±0.29µs	86.8±0.19µs	+0.23%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.03µs	5.4±0.06µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.4±0.54ns	92.3±0.30ns	-0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.5±0.30µs	64.9±0.52µs	+0.62%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	947.4±4.87µs	952.9±3.74µs	+0.58%
resource_index/index_build//tmp/ark-fs-index-benchmarksdlp6ac	117.7±2.70ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksiBuvQD	114.6±2.21ms	N/A	N/A
resource_index/index_get_resource_by_id	96.8±0.37ns	98.4±0.45ns	+1.65%
resource_index/index_get_resource_by_path	52.6±0.15ns	54.4±0.39ns	+3.42%
resource_index/index_update_all	1134.7±54.53ms	1169.1±51.93ms	+3.03%
resource_index/index_update_one	688.0±29.08ms	701.6±31.71ms	+1.98%

fs-index/src/watch.rs

kirillt · 2024-09-10T13:08:33Z

fs-index/src/watch.rs

+
+                    let relative_path = file.strip_prefix(&root_path)?;
+                    log::info!("Relative path: {:?}", relative_path);
+                    index.update_one(relative_path)?;


The result should be used to provide user with actual updates.

So, we have 2 approaches to choose from:

Make update_one return same IndexUpdate type as update_all (simple). Then watch_index would return same type to the user. We could batch updates made in some interval to pack events together, but that's optional (if you find this idea useful, we can create a follow-up task).

Alternatively, we could specialize update_one, so Track API and Watch API would become more powerful comparing to Reactive API. The extra power I mean is more finely-grained events, similar to what notify-rs provides: not only add/remove, but also rename/modify. This is more difficult though, and we would need to unify results from update_all and update_one before returning from watch_index. I suggest creating a follow-up issue for future consideration of this approach.

I chose the first approach to get things running faster and keep it simpler for now. I also plan to add tests soon.

Next, I want to add integration tests for this functionality. We could either add integration tests for fs-index directly or implement CI shell scripts to test ark-cli watch <PATH> for an end-to-end approach—possibly both?

Do you think CI shell scripts for ark-cli watch would be sufficient, or should we also include programmatic tests? For example, running the watcher in a separate thread and doing many create/delete operations to verify the results. Now that I think about it, writing these tests programmatically could be complex. What’s your take?

I suggest creating a follow-up issue for future consideration of this approach.

tracked in #89

Do you think CI shell scripts for ark-cli watch would be sufficient, or should we also include programmatic tests? For example, running the watcher in a separate thread and doing many create/delete operations to verify the results. Now that I think about it, writing these tests programmatically could be complex. What’s your take?

Agree, I think we can achieve proper result by simple shell script. I imagine it like this:

Run ark-cli watch in background, direct its output to dedicated log file.

Randomly modify folder content and write the performed actions into another log file.

Then compare the two log files.

Added a shell script integration/ark-cli-watch.sh to check the sanity of ark-cli watch. I think it’s pretty cool because it also checks other parts of the code for sanity along the way, like update_one in this case. Using ark-cli is definitely a great way to write end-to-end shell scripts to verify what we have.

I’ve also set it up to run in the CI with each push/PR to the main branch. You can check out the expected workflow in my fork here. I’ve been using it for debugging

fs-index/Cargo.toml

tareknaser · 2024-09-11T20:16:22Z

I spent some time today looking into different ways to use notify by going through the docs and examples. Right now, we're using async_monitor, but it might not be the best choice for us.

Order is very important in our case because we need events to happen in the right order (for example, we don’t want to see "file1.txt removed" before "file1.txt created", since this would mess up our update_one() logic). Using an asynchronous watcher could cause issues with keeping the events in order.

Btw, I think this might be why we saw this error. As I reported, the error wasn’t consistent, which could be because asynchronous events don't always happen in the right order.

While looking through the examples, I also noticed that we might want to use the Debouncer. The file system can sometimes send multiple events for what is really just one change, which could cause problems. For example, it might trigger update_one() several times when a file is created.

I'm now testing this in a smaller example and looking at how to set up the event stream properly.

github-actions · 2024-09-16T13:49:19Z

Benchmark for `e4987f5`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.1±0.61µs	248.5±2.46µs	-0.24%
blake3_resource_id_creation/compute_from_bytes:medium	15.8±1.14µs	15.6±0.12µs	-1.27%
blake3_resource_id_creation/compute_from_bytes:small	1364.5±2.05ns	1365.9±1.64ns	+0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.3±3.99µs	197.1±3.13µs	-0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1699.3±3.73µs	1718.3±22.19µs	+1.12%
crc32_resource_id_creation/compute_from_bytes:large	86.7±1.35µs	86.8±1.13µs	+0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.13µs	5.4±0.01µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.19ns	92.3±0.30ns	0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.1±0.32µs	64.2±0.60µs	+0.16%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	953.1±88.59µs	933.0±5.78µs	-2.11%
resource_index/index_build//tmp/ark-fs-index-benchmarkst4cPto	109.9±1.24ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkszKpEve	105.9±3.20ms	N/A	N/A
resource_index/index_get_resource_by_id	99.6±0.38ns	95.0±2.01ns	-4.62%
resource_index/index_get_resource_by_path	55.6±2.38ns	50.6±0.75ns	-8.99%
resource_index/index_update_all	1117.9±32.54ms	1125.5±52.37ms	+0.68%
resource_index/index_update_one	666.8±18.20ms	667.6±28.00ms	+0.12%

github-actions · 2024-09-16T14:41:34Z

Benchmark for `a95e06a`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	251.3±1.17µs	249.3±0.46µs	-0.80%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.06µs	15.6±0.07µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1362.8±6.87ns	1355.7±8.95ns	-0.52%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.8±0.35µs	196.8±0.52µs	0.00%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1705.4±4.30µs	1703.1±20.62µs	-0.13%
crc32_resource_id_creation/compute_from_bytes:large	86.8±0.38µs	86.7±0.15µs	-0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.03µs	5.4±0.05µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.18ns	92.4±0.48ns	+0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.5±0.37µs	64.7±1.40µs	+0.31%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	939.6±4.55µs	941.8±11.76µs	+0.23%
resource_index/index_build//tmp/ark-fs-index-benchmarks4VHJsr	109.1±2.05ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksjA6AY8	108.2±2.18ms	N/A	N/A
resource_index/index_get_resource_by_id	100.5±0.31ns	94.4±0.22ns	-6.07%
resource_index/index_get_resource_by_path	53.8±0.30ns	50.4±0.18ns	-6.32%
resource_index/index_update_all	1114.8±29.19ms	1118.7±41.25ms	+0.35%
resource_index/index_update_one	660.5±20.57ms	665.9±23.63ms	+0.82%

fs-index/src/index.rs

kirillt · 2024-09-16T22:02:36Z

Nitpick, but we can define aliases IndexUpdate::addition(id, path) and IndexUpdate::removal(id) for these snippets:

result.removed.insert(id.item);

result
    .added
    .insert(id, HashSet::from([timpestamped_path]));

Then we could immediately return from update_one once we determined the update:

return Ok(IndexUpdate::removal(id.item));

return Ok(IndexUpdate::addition(id, timpestamped_path));

This should be more readable.

By the way, it seems that we could simplify added field of the IndexUpdate structure. Since we don't distinguish duplicates, we can take any path as a representative of the group, so the app could do something with it. In practice, when unique resource is detected, we take its path as the representative. When a duplicate appears, we skip it. If during unique addition, several paths were introduced at once, we take an arbitrary one (options: 1) random; 2) just first in the vector; 3) the shortest path).

From API point of view, we don't need a collection of paths attached to the addition event, only one path (representative).

We might need separate events to track duplicates. Something like DuplicateAdded(id, path) and DuplicateRemoved(id, path). Although, I'm not sure that duplicate removal can be useful, maybe duplicate addition is enough. It could be used to allow the user to select representative manually. Just idea for future.

tareknaser · 2024-10-29T14:03:48Z

Then we could immediately return from update_one once we determined the update:

update_one() currently has an if statement that checks early on if the update is a removal or an addition, which makes it able to return from each branch as soon as possible.

By the way, it seems that we could simplify added field of the IndexUpdate structure.

I was actually looking at the code today and thought the same thing. The added field definitely has an interesting type, though I can’t quite remember why we landed on it—probably something inherited from older arklib code.

I think we could track this in a separate issue and handle it in its own PR. What do you think?

github-actions · 2024-10-29T15:45:15Z

Benchmark for `94fccc9`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.8±1.07µs	248.6±0.77µs	-0.48%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.06µs	15.6±0.07µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1361.1±2.14ns	1361.1±2.28ns	0.00%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.2±1.30µs	196.0±0.83µs	-0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1664.5±5.23µs	1679.8±33.90µs	+0.92%
crc32_resource_id_creation/compute_from_bytes:large	86.9±0.36µs	87.0±2.44µs	+0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.03µs	5.4±0.01µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.3±0.37ns	92.4±0.46ns	+0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.1±0.33µs	64.0±0.37µs	-0.16%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	908.8±5.02µs	911.2±5.90µs	+0.26%
resource_index/index_build//tmp/ark-fs-index-benchmarksRURZ2N	108.3±1.83ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksVbGibi	108.1±1.37ms	N/A	N/A
resource_index/index_get_resource_by_id	96.6±0.43ns	97.5±1.03ns	+0.93%
resource_index/index_get_resource_by_path	53.6±0.27ns	55.5±0.42ns	+3.54%
resource_index/index_update_all	1102.3±38.40ms	1116.2±38.74ms	+1.26%
resource_index/index_update_one	668.0±23.43ms	676.8±22.17ms	+1.32%

kirillt · 2024-10-30T08:02:17Z

fs-index/src/index.rs

+                result.removed.insert(id.item);
            }
-
-            result.removed.insert(id.item);


We need some test cases for this.

As well as test cases demonstrating what happens if the caller violates assumptions:

/// **Note**: The caller must ensure that:

/// - The index is up-to-date with the file system except for the updated
/// resource
/// - In case of a addition, the resource was not already in the index
/// - In case of a modification or removal, the resource was already in the
/// index

I'm thinking now that we should panic when the assumptions are violated.

We need some test cases for this.

Added a check in the existing test test_track_removal_with_collision for this particular case.

As well as test cases demonstrating what happens if the caller violates assumptions:

update_one() is meant to be a simpler, more targeted way to update the index compared to update_all(). This comes with a few constraints and assumptions that the caller needs to handle. The caller must ensure the index state is as expected before calling update_one() to avoid issues.

For example, if update_one() is called on a non-existent file, it will panic with an error:

Caller must ensure that the resource exists in the index: "file.txt"

If the caller mistakenly thinks a file was modified but it wasn’t, update_one() will still reindex it with no impact, as the information remains consistent.

Adding checks to enforce conditions—like verifying the index is current with the file system except for the updated resource—would essentially turn update_one() into update_all(), as this would require a full reindex. To keep update_one() efficient, it’s better to leave it as is. We are including a clear warning in the documentation emphasizing that the caller should confirm these conditions. I think this is more than sufficient

Adding checks to enforce conditions—like verifying the index is current with the file system except for the updated resource—would essentially turn update_one() into update_all(), as this would require a full reindex. To keep update_one() efficient, it’s better to leave it as is.

Yeah, for sure we don't want to rescan the folder during update_one. We could assert some simple invariants like that the id is present in the index, but this actually already done (implicitly).

We are including a clear warning in the documentation emphasizing that the caller should confirm these conditions. I think this is more than sufficient

Yes, but if we can make it fool-proof, that would be the best.

I see that if user calls update_one(removed_path) and the id of removed file isn't there unwrap will panic, ensuring no inconsistent state is introduced. This is good, but even better would be to provide error message. User could forget to call update_one when resource was introduced, but make the call when it was removed.

If user calls update_one(added_path) we insert the path into self.id_to_paths.entry(id.clone()).or_default() which handles both new addition and duplicate addition (regardless if the path was there or not). This looks good, not need to check anything.

This is good, but even better would be to provide error message. User could forget to call update_one when resource was introduced, but make the call when it was removed.

Updated the code to gracefully return an error if this case happens

integration/ark-cli-watch.sh

kirillt · 2024-10-30T08:05:22Z

Then we could immediately return from update_one once we determined the update:

update_one() currently has an if statement that checks early on if the update is a removal or an addition, which makes it able to return from each branch as soon as possible.

I'm talking only about code clarity/maintainability.

kirillt · 2024-10-30T08:17:38Z

I was actually looking at the code today and thought the same thing. The added field definitely has an interesting type, though I can’t quite remember why we landed on it—probably something inherited from older arklib code.

We can have multiple paths for same id, and we can have many ids when use update_all.

I think we could track this in a separate issue and handle it in its own PR. What do you think?

Agree, created issue:

Representative resources and dedicated events for duplicates/collisions #93

github-actions · 2024-10-31T17:31:45Z

Benchmark for `4dd461c`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	250.4±2.08µs	249.7±2.22µs	-0.28%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.13µs	15.6±0.35µs	0.00%
blake3_resource_id_creation/compute_from_bytes:small	1357.0±10.67ns	1362.7±5.55ns	+0.42%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.3±1.19µs	196.1±1.11µs	-0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1672.1±22.35µs	1670.4±18.00µs	-0.10%
crc32_resource_id_creation/compute_from_bytes:large	86.8±0.45µs	86.9±0.42µs	+0.12%
crc32_resource_id_creation/compute_from_bytes:medium	5.4±0.02µs	5.4±0.03µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	92.9±2.24ns	92.4±0.47ns	-0.54%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.7±0.70µs	64.2±0.56µs	-0.77%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	907.9±5.89µs	908.4±5.79µs	+0.06%
resource_index/index_build//tmp/ark-fs-index-benchmarksDcxEpT	113.5±1.25ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksIfqZ8c	106.8±2.65ms	N/A	N/A
resource_index/index_get_resource_by_id	96.7±0.71ns	97.4±1.06ns	+0.72%
resource_index/index_get_resource_by_path	55.2±2.57ns	55.8±0.75ns	+1.09%
resource_index/index_update_all	1126.8±45.57ms	1106.3±45.55ms	-1.82%
resource_index/index_update_one	660.4±22.35ms	672.1±25.42ms	+1.77%

fs-index/src/index.rs

tareknaser · 2024-11-11T15:07:03Z

I made a few updates to the PR:

Added logging for the index_watch example
Added a note in the update_one() doc comment to specify that for rename or move operations, update_one() should be called twice
Added a couple of tests to cover these two cases

github-actions · 2024-11-11T15:34:49Z

Benchmark for `5c4137d`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.4±2.19µs	248.0±0.93µs	-0.56%
blake3_resource_id_creation/compute_from_bytes:medium	15.5±0.11µs	15.6±0.19µs	+0.65%
blake3_resource_id_creation/compute_from_bytes:small	1364.9±2.48ns	1370.8±13.09ns	+0.43%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	196.9±2.37µs	201.6±0.38µs	+2.39%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1688.5±5.19µs	1734.0±24.14µs	+2.69%
crc32_resource_id_creation/compute_from_bytes:large	87.0±1.35µs	86.7±0.35µs	-0.34%
crc32_resource_id_creation/compute_from_bytes:medium	5.5±0.40µs	5.4±0.07µs	-1.82%
crc32_resource_id_creation/compute_from_bytes:small	92.9±2.77ns	92.4±0.77ns	-0.54%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.4±0.33µs	64.3±0.16µs	-0.16%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	915.0±4.78µs	915.6±5.14µs	+0.07%
resource_index/index_build//tmp/ark-fs-index-benchmarks4CiNdu	110.7±0.74ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarks9xZzKq	112.2±1.63ms	N/A	N/A
resource_index/index_get_resource_by_id	100.3±1.52ns	97.2±0.59ns	-3.09%
resource_index/index_get_resource_by_path	60.9±0.22ns	54.4±0.54ns	-10.67%
resource_index/index_update_all	1097.1±28.25ms	1099.0±44.13ms	+0.17%
resource_index/index_update_one	664.7±15.27ms	663.4±17.03ms	-0.20%

fs-index/src/tests.rs

github-actions · 2024-11-20T11:07:39Z

Benchmark for `4d69b11`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	249.7±1.05µs	249.0±2.75µs	-0.28%
blake3_resource_id_creation/compute_from_bytes:medium	15.8±2.31µs	15.5±0.05µs	-1.90%
blake3_resource_id_creation/compute_from_bytes:small	1355.3±35.24ns	1345.9±4.30ns	-0.69%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.5±3.48µs	197.8±2.29µs	+0.15%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1681.1±11.75µs	1690.6±20.60µs	+0.57%
crc32_resource_id_creation/compute_from_bytes:large	92.0±1.47µs	91.7±0.46µs	-0.33%
crc32_resource_id_creation/compute_from_bytes:medium	5.7±0.11µs	5.7±0.06µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	96.4±0.43ns	96.5±1.05ns	+0.10%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	64.8±0.92µs	64.9±0.66µs	+0.15%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	920.3±16.31µs	910.3±11.27µs	-1.09%
resource_index/index_build//tmp/ark-fs-index-benchmarksAQUXEx	103.6±2.56ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksq7cThX	105.5±2.89ms	N/A	N/A
resource_index/index_get_resource_by_id	102.7±3.49ns	99.6±1.64ns	-3.02%
resource_index/index_get_resource_by_path	55.8±2.81ns	53.2±0.53ns	-4.66%
resource_index/index_update_all	1101.9±34.51ms	1122.1±49.65ms	+1.83%
resource_index/index_update_one	662.7±21.92ms	677.0±22.89ms	+2.16%

kirillt

Thanks, great job

and enhance `update_one()` functionality Signed-off-by: Tarek <[email protected]>

Signed-off-by: Tarek <[email protected]>

…ate index Signed-off-by: Tarek <[email protected]>

github-actions · 2024-11-20T15:47:09Z

Benchmark for `5031b4a`

Click to view benchmark

Test	Base	PR	%
blake3_resource_id_creation/compute_from_bytes:large	248.4±0.35µs	249.5±1.34µs	+0.44%
blake3_resource_id_creation/compute_from_bytes:medium	15.6±0.23µs	15.5±0.06µs	-0.64%
blake3_resource_id_creation/compute_from_bytes:small	1345.5±1.71ns	1346.0±2.10ns	+0.04%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg	197.5±0.75µs	197.5±1.17µs	0.00%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf	1692.2±12.89µs	1692.1±30.31µs	-0.01%
crc32_resource_id_creation/compute_from_bytes:large	91.8±0.19µs	92.6±5.35µs	+0.87%
crc32_resource_id_creation/compute_from_bytes:medium	5.7±0.01µs	5.7±0.03µs	0.00%
crc32_resource_id_creation/compute_from_bytes:small	96.4±0.69ns	96.5±1.59ns	+0.10%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg	65.2±2.36µs	65.6±0.96µs	+0.61%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf	913.1±6.70µs	917.4±4.90µs	+0.47%
resource_index/index_build//tmp/ark-fs-index-benchmarksNNAGfc	108.3±1.74ms	N/A	N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksm7c99V	107.3±1.55ms	N/A	N/A
resource_index/index_get_resource_by_id	103.2±3.77ns	100.7±2.39ns	-2.42%
resource_index/index_get_resource_by_path	54.0±1.92ns	57.3±2.15ns	+6.11%
resource_index/index_update_all	1118.0±45.37ms	1114.5±52.34ms	-0.31%
resource_index/index_update_one	690.5±25.55ms	655.5±25.81ms	-5.07%

kirillt reviewed Apr 28, 2024

View reviewed changes

fs-index/examples/index_watch.rs Outdated Show resolved Hide resolved

kirillt reviewed May 11, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed May 11, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed May 11, 2024

View reviewed changes

fs-index/src/watch.rs Show resolved Hide resolved

kirillt reviewed May 11, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed May 11, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

tareknaser force-pushed the watch branch from de28f05 to df7879d Compare May 12, 2024 06:14

tareknaser marked this pull request as draft May 12, 2024 18:25

tareknaser mentioned this pull request Sep 1, 2024

fs-index: Track API #85

Merged

tareknaser force-pushed the watch branch from df7879d to 6aa9e49 Compare September 9, 2024 09:20

tareknaser marked this pull request as ready for review September 9, 2024 09:23

kirillt reviewed Sep 9, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed Sep 9, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed Sep 10, 2024

View reviewed changes

fs-index/src/watch.rs Outdated Show resolved Hide resolved

kirillt reviewed Sep 10, 2024

View reviewed changes

fs-index/Cargo.toml Outdated Show resolved Hide resolved

tareknaser mentioned this pull request Sep 16, 2024

Improve ResourceIndex::update_one() for More Detailed Events #89

Open

kirillt reviewed Sep 16, 2024

View reviewed changes

fs-index/src/index.rs Show resolved Hide resolved

kirillt reviewed Sep 16, 2024

View reviewed changes

fs-index/src/index.rs Outdated Show resolved Hide resolved

kirillt reviewed Sep 16, 2024

View reviewed changes

fs-index/src/index.rs Show resolved Hide resolved

kirillt mentioned this pull request Sep 16, 2024

#41: Add unit tests, fix update_one ARK-Builders/arklib#72

Closed

kirillt reviewed Oct 30, 2024

View reviewed changes

integration/ark-cli-watch.sh Outdated Show resolved Hide resolved

kirillt reviewed Nov 10, 2024

View reviewed changes

fs-index/src/index.rs Show resolved Hide resolved

kirillt reviewed Nov 13, 2024

View reviewed changes

fs-index/src/tests.rs Show resolved Hide resolved

kirillt approved these changes Nov 20, 2024

View reviewed changes

tareknaser added 3 commits November 20, 2024 17:25

feat(fs-index): implement index watch functionality using notify

f94a9fb

and enhance `update_one()` functionality Signed-off-by: Tarek <[email protected]>

feat(fs-index): add integration tests for ark-cli watch functionality

d1f909c

Signed-off-by: Tarek <[email protected]>

feat(ark-cli): add watch command to monitor directory changes and upd…

6e73872

…ate index Signed-off-by: Tarek <[email protected]>

tareknaser force-pushed the watch branch from 00b8d2e to 6e73872 Compare November 20, 2024 15:27

tareknaser merged commit 66c9362 into main Nov 20, 2024
4 checks passed

tareknaser deleted the watch branch November 20, 2024 16:10

Add watch_index method and ark-cli watch command #36

Add watch_index method and ark-cli watch command #36

Conversation

tareknaser commented Apr 22, 2024

Description

Testing

github-actions bot commented Apr 22, 2024

Benchmark for 341c426

kirillt commented May 11, 2024

github-actions bot commented May 12, 2024

Benchmark for 0332bd7

tareknaser commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

Benchmark for dc67cc3

tareknaser commented Sep 9, 2024

kirillt commented Sep 9, 2024

github-actions bot commented Sep 9, 2024

Benchmark for b8ed0bd

tareknaser commented Sep 9, 2024

kirillt commented Sep 9, 2024 • edited Loading

github-actions bot commented Sep 9, 2024

Benchmark for 4fe7076

tareknaser commented Sep 10, 2024

github-actions bot commented Sep 10, 2024

Benchmark for 1bbb897

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tareknaser commented Sep 11, 2024

github-actions bot commented Sep 16, 2024

Benchmark for e4987f5

github-actions bot commented Sep 16, 2024

Benchmark for a95e06a

kirillt commented Sep 16, 2024

tareknaser commented Oct 29, 2024

github-actions bot commented Oct 29, 2024

Benchmark for 94fccc9

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kirillt commented Oct 30, 2024

kirillt commented Oct 30, 2024

github-actions bot commented Oct 31, 2024

Benchmark for 4dd461c

tareknaser commented Nov 11, 2024

github-actions bot commented Nov 11, 2024

Benchmark for 5c4137d

github-actions bot commented Nov 20, 2024

Benchmark for 4d69b11

kirillt left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 20, 2024

Benchmark for 5031b4a

Add `watch_index` method and `ark-cli watch` command #36

Add `watch_index` method and `ark-cli watch` command #36

Benchmark for `341c426`

Benchmark for `0332bd7`

Benchmark for `dc67cc3`

Benchmark for `b8ed0bd`

kirillt commented Sep 9, 2024 •

edited

Loading

Benchmark for `4fe7076`

Benchmark for `1bbb897`

Benchmark for `e4987f5`

Benchmark for `a95e06a`

Benchmark for `94fccc9`

Benchmark for `4dd461c`

Benchmark for `5c4137d`

Benchmark for `4d69b11`

Benchmark for `5031b4a`