Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion about common method of keeping track of (to) scanned files #6529

Closed
bartv2 opened this issue Dec 20, 2013 · 9 comments
Closed

Discussion about common method of keeping track of (to) scanned files #6529

bartv2 opened this issue Dec 20, 2013 · 9 comments

Comments

@bartv2
Copy link
Contributor

bartv2 commented Dec 20, 2013

From owncloud/music#90 (comment):

I had to find a solution for the 'scan a lot of files' problem when working on search_lucene.

IMHO The files app has the best scanning mechanism because it saves the scan status in the filecache table. Can we find a general solution for scanning mechanisms? How do we mark a file as scanned by search_lucene (I currently join my own status table with the filecache table ... but meh) or by music (or by pictures or by videos)? All of them could extract meta data. I personally would like to explore zend_lucene for searching through that ... (cc @andrewsbrown) ... but I may be biased. Perhaps we can come up with a more general search API that can use either the database or zend lucene ....

Anyway. I propose the following approach: we create a new onscan event/hook that apps that want to extract meta data can register for. Correct me if I am wrong but the files app currently determines the mime type and basic file meta data. music could listen for audio files and then use getid3 to extract more specific meta data. search_lucene likewise. Unfortunately, this naive implementations will cause synchronization to take longer because the file app scans the file on write causing an upload via webdav to wait for all meta extraction to be complete. Therefore we need to separate "file app"-scanning and meta data indexing from each other. This is where background processes come into play. I just don't know yet if adding a separate job for each file makes sense or if just adding a "reindex" job is more elegant. The problem with a "reindex" job is that you don't want to reindex all files but only the ones that are new. In search_lucene I use a status table to keep track of this. Music could use a flag in the songs table but then we are duplicating the mechanism. Single jobs will fill the jobs table quite quickly ...

I also have a table for the antivirus background scanner, so having a common sollution would be nice
@icewind1991 @karlitschek @kabum @DeepDiver1975

@DeepDiver1975
Copy link
Member

We recently introduced some hooks to the scanner ... @icewind1991

@MorrisJobke
Copy link
Contributor

@DeepDiver1975 Does this solves the problem for an initial scan?

@DeepDiver1975
Copy link
Member

@kabum you are talking about the situation where there are files and an app is enabled and has to perform an initial scan - right?

@MorrisJobke
Copy link
Contributor

@DeepDiver1975 yes

@PVince81
Copy link
Contributor

PVince81 commented Aug 3, 2015

Is this still valid ?

@MorrisJobke
Copy link
Contributor

@PVince81 Yes. This is about storing a "processed" status flag on a per app basis.

@PVince81
Copy link
Contributor

PVince81 commented Oct 7, 2016

We already have scan hooks, these should be triggered also only for new files.
Initial scan now happens in a background job.
For the processed flag, every app could, if needed, decide to store such flag in an app-specific table.

@butonic what do you think, in relation to search ? Keep or close ?

@lock
Copy link

lock bot commented Aug 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants