Discussion about common method of keeping track of (to) scanned files #6529

bartv2 · 2013-12-20T12:18:59Z

I had to find a solution for the 'scan a lot of files' problem when working on search_lucene.

IMHO The files app has the best scanning mechanism because it saves the scan status in the filecache table. Can we find a general solution for scanning mechanisms? How do we mark a file as scanned by search_lucene (I currently join my own status table with the filecache table ... but meh) or by music (or by pictures or by videos)? All of them could extract meta data. I personally would like to explore zend_lucene for searching through that ... (cc @andrewsbrown) ... but I may be biased. Perhaps we can come up with a more general search API that can use either the database or zend lucene ....

Anyway. I propose the following approach: we create a new onscan event/hook that apps that want to extract meta data can register for. Correct me if I am wrong but the files app currently determines the mime type and basic file meta data. music could listen for audio files and then use getid3 to extract more specific meta data. search_lucene likewise. Unfortunately, this naive implementations will cause synchronization to take longer because the file app scans the file on write causing an upload via webdav to wait for all meta extraction to be complete. Therefore we need to separate "file app"-scanning and meta data indexing from each other. This is where background processes come into play. I just don't know yet if adding a separate job for each file makes sense or if just adding a "reindex" job is more elegant. The problem with a "reindex" job is that you don't want to reindex all files but only the ones that are new. In search_lucene I use a status table to keep track of this. Music could use a flag in the songs table but then we are duplicating the mechanism. Single jobs will fill the jobs table quite quickly ...

I also have a table for the antivirus background scanner, so having a common sollution would be nice
@icewind1991 @karlitschek @kabum @DeepDiver1975

DeepDiver1975 · 2013-12-20T12:32:21Z

We recently introduced some hooks to the scanner ... @icewind1991

DeepDiver1975 · 2013-12-20T12:40:07Z

https://github.com/owncloud/core/blob/master/lib/private/files/cache/scanner.php#L159

MorrisJobke · 2013-12-20T13:29:35Z

@DeepDiver1975 Does this solves the problem for an initial scan?

DeepDiver1975 · 2013-12-20T14:03:11Z

@kabum you are talking about the situation where there are files and an app is enabled and has to perform an initial scan - right?

MorrisJobke · 2013-12-20T15:15:42Z

@DeepDiver1975 yes

PVince81 · 2015-08-03T13:57:56Z

Is this still valid ?

MorrisJobke · 2015-08-03T14:32:54Z

@PVince81 Yes. This is about storing a "processed" status flag on a per app basis.

PVince81 · 2016-10-07T15:30:20Z

We already have scan hooks, these should be triggered also only for new files.
Initial scan now happens in a background job.
For the processed flag, every app could, if needed, decide to store such flag in an app-specific table.

@butonic what do you think, in relation to search ? Keep or close ?

lock · 2019-08-02T10:01:23Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

karlitschek added the enhancement label Sep 21, 2014

DeepDiver1975 modified the milestone: backlog Mar 21, 2015

PVince81 closed this as completed Sep 11, 2017

lock bot locked as resolved and limited conversation to collaborators Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion about common method of keeping track of (to) scanned files #6529

Discussion about common method of keeping track of (to) scanned files #6529

bartv2 commented Dec 20, 2013 •

edited by DeepDiver1975

Loading

DeepDiver1975 commented Dec 20, 2013

DeepDiver1975 commented Dec 20, 2013

MorrisJobke commented Dec 20, 2013

DeepDiver1975 commented Dec 20, 2013

MorrisJobke commented Dec 20, 2013

PVince81 commented Aug 3, 2015

MorrisJobke commented Aug 3, 2015

PVince81 commented Oct 7, 2016

lock bot commented Aug 2, 2019

Discussion about common method of keeping track of (to) scanned files #6529

Discussion about common method of keeping track of (to) scanned files #6529

Comments

bartv2 commented Dec 20, 2013 • edited by DeepDiver1975 Loading

DeepDiver1975 commented Dec 20, 2013

DeepDiver1975 commented Dec 20, 2013

MorrisJobke commented Dec 20, 2013

DeepDiver1975 commented Dec 20, 2013

MorrisJobke commented Dec 20, 2013

PVince81 commented Aug 3, 2015

MorrisJobke commented Aug 3, 2015

PVince81 commented Oct 7, 2016

lock bot commented Aug 2, 2019

bartv2 commented Dec 20, 2013 •

edited by DeepDiver1975

Loading