fscrawler not indexing old files #1249
cadm-frank
started this conversation in
General
Replies: 1 comment 4 replies
-
We're facing the same problem. Here's our solution. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
we noticed that fscrawler does not index files with old timestamps, once fscrawler has finished the first run.
This happens e.g. when files are moved around by users on a Samba share.
Touching those files is not an option, as it would modify users' files.
Reindexing everything every couple of minutes is out as well, due to the amount of time required for a full index, which in our case is multiple days.
I think this is basically #136 and #419, but the problem is that the timestamps on a file or directory cannot give you the full information needed if that file needs to be indexed or not.
Here's an example:
Note that test_has_been_moved/mynewfile has the exact same timestamps as test/mynewfile, but still needs to be reindexed, even if it appears old.
So when the ctime of a directory changs, you'd have to force a full reindex of that directory and all of its subdirectories if you don't have a list of known files to compare to.
But forcing a reindex of a full directory on a ctime change is an issue, because ctime is also updated when a new file is created in the directory:
So in essence, forcing a reindex for a directory and all its subdirectories on a ctime change may cause some unnecessary indexing depending on circumstances, but it should catch every file move.
Beta Was this translation helpful? Give feedback.
All reactions