Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement .nomedia/.noimage filter #944

Merged
merged 1 commit into from
Mar 17, 2022
Merged

Implement .nomedia/.noimage filter #944

merged 1 commit into from
Mar 17, 2022

Conversation

marcelklehr
Copy link
Member

@marcelklehr marcelklehr commented Jan 2, 2022

for Timeline and Tags. Happy new year ✨

fixes #234

The main difference to #939 is that that one maintains a DB table with excluded paths which is the loaded on the client using initialState, while this PR uses a DAV SEARCH for .nomedia/.noimage files on initial page load uses a cached file search loaded on the client using initialState.

@szaimen szaimen added this to the Nextcloud 24 milestone Jan 2, 2022
@szaimen szaimen added 3. to review Waiting for reviews enhancement New feature or request labels Jan 2, 2022
@marcelklehr marcelklehr force-pushed the feat/nomedia branch 2 times, most recently from ae40bc4 to 957674b Compare January 2, 2022 16:19
Copy link
Member

@skjnldsv skjnldsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm afraid such request will have an insane load on instances with high number of files.
Do you have some rough benchmarks on how long this search takes? :)

Happy new year Marcel!! 🎉 🤗

@marcelklehr
Copy link
Member Author

marcelklehr commented Jan 3, 2022

I'm afraid such request will have an insane load on instances with high number of files.
Do you have some rough benchmarks on how long this search takes? :)

Mh, I see, that would be bad. But why would this specific DAV search query take longer than e.g. the "Your Photos" search query? I'm happy to do a benchmark, though :)

@marcelklehr
Copy link
Member Author

marcelklehr commented Jan 3, 2022

Benchmark

I loaded ~10.000 files into my dev instance. Not sure if that's enough, big instances are probably considerably larger, but it's something.

MariaDB [nextcloud]> select COUNT(*) from oc_filecache;
+----------+
| COUNT(*) |
+----------+
|     9716 |
+----------+
1 row in set (0.027 sec)

The DAV search for .nomedia/.noimage files with infinite depth for the current user takes ~247ms while the normal Timeline DAV search takes ~200ms.

image

@skjnldsv
Copy link
Member

skjnldsv commented Jan 4, 2022

The DAV search for .nomedia/.noimage files with infinite depth for the current user takes ~247ms

That is quite high. It means we'd have to wait for quite some time before we get the final list of files 🤔

@marcelklehr
Copy link
Member Author

marcelklehr commented Jan 4, 2022

It means we'd have to wait for quite some time before we get the final list of files

As I understand it the two queries should be executed in parallel, because Vue doesn't wait for the created hook to finish, so the load time should be increased by ~50ms in my setup.

I'm happy to close this PR in favor of #939, though, as long as we can land .nomedia filtering. :)

@marcelklehr
Copy link
Member Author

/compile amend /js

@skjnldsv
Copy link
Member

Maybe it would be best to have a private API for that. Just returning the ignored paths?

@Pytal Pytal requested review from a team and szaimen and removed request for a team January 14, 2022 19:04
@skjnldsv skjnldsv added 2. developing Work in progress and removed 3. to review Waiting for reviews labels Jan 16, 2022
@marcelklehr
Copy link
Member Author

Maybe it would be best to have a private API for that. Just returning the ignored paths?

@skjnldsv That would go into the direction of #939 if I understand you correctly. Although, #939 maintains a database table for ignored paths. A compromise between the two PRs would be to have an initial state that does the same query as the SEARCH request in this PR?

@skjnldsv
Copy link
Member

A compromise between the two PRs would be to have an initial state

Either initial state or dedicated request, both are fine.
if the request is expensive for the server, I would rather have this on-demand than on photos app init.

The database extra seems a bit overpowered, maybe it's worth doing a bit of in-depth analysis on how long and expensive a php search request would be. Then we can maybe drop the table idea. 🤷

@marcelklehr
Copy link
Member Author

marcelklehr commented Jan 17, 2022

in-depth analysis on how long and expensive a php search request would be

I'd defer to @icewind1991 on this, but from what I understand, there's the file cache table in the database which lists all files in the instance and fetching all .nomedia files from that should be fairly fast, be it using a WebDAV SEARCH request or a custom SQL query:

MariaDB [nextcloud]> explain select * from oc_filecache JOIN oc_mounts ON oc_filecache.storage = oc_mounts.storage_id where name = '.nomedia' and oc_mounts.user_id = 'admin';
+------+-------------+--------------+------+-----------------------------------------------------------------------------------------------------+------------------------+---------+--------------------------------+------+-----------------------+
| id   | select_type | table        | type | possible_keys                                                                                       | key                    | key_len | ref                            | rows | Extra                 |
+------+-------------+--------------+------+-----------------------------------------------------------------------------------------------------+------------------------+---------+--------------------------------+------+-----------------------+
|    1 | SIMPLE      | oc_mounts    | ref  | mounts_user_root_index,mounts_storage_index                                                         | mounts_user_root_index | 258     | const                          | 1    | Using index condition |
|    1 | SIMPLE      | oc_filecache | ref  | fs_storage_path_hash,fs_storage_mimetype,fs_storage_mimepart,fs_storage_size,fs_storage_path_prefix | fs_storage_mimetype    | 8       | nextcloud.oc_mounts.storage_id | 1552 | Using where           |
+------+-------------+--------------+------+-----------------------------------------------------------------------------------------------------+------------------------+---------+--------------------------------+------+-----------------------+

on-demand

On-demand is hard to do, IMHO, because we don't know whether there are any .nomedia files until we've fetched the list, right?

@skjnldsv
Copy link
Member

skjnldsv commented Jan 17, 2022

On-demand is hard to do, IMHO, because we don't know whether there are any .nomedia files until we've fetched the list, right?

you can make that part of the async/await process.
run both in parallel and filter out paths? 🤔

EDIT: but it'sj just brain dumps, feel free to go for an initialstate approach too :)

@marcelklehr
Copy link
Member Author

you can make that part of the async/await process.
run both in parallel and filter out paths? thinking

That's what I'm doing in this PR already I believe. The request I add in these commits never blocks anything. So, while the request does cause load on the server, it theoretically shouldn't affect the user experience.

Maybe it would be best to have a private API for that.
[...]
Either initial state or dedicated request, both are fine.

My crucial point is that the main bottleneck is (probably) SQL (for querying the file cache) in this, if we use WebDAV, or initialState or a private API, we still have the same SQL query in the end (minus the XML parsing, granted).

A private API for this IMHO has no benefits over using the official WebDAV SEARCH endpoint. Using initial state has the drawback of adding the runtime of the query to the initial page load, which of course blocks the app loading, while using a request on demand on vue.js init does not.

The performance impact of the query remains.

One way to make the SEARCH request faster is an index on the filecache table à la CREATE INDEX fast_filename_search ON oc_filecache (storage, name); (which I'm surprised does not yet exist), but this would have to be in core. (cc @icewind1991 Is this feasable / a good idea ?)

Alternatively we go down the route of constructing our own "index" of .nomedia files á la #939 (which I also find a bit overengineering from a software engineering perspective, but what can you do).

@marcelklehr
Copy link
Member Author

marcelklehr commented Jan 25, 2022

A compromise would be to cache the result of searching for .nomedia files in PHP, based on the Etag of the user's root folder. Then we'd only have the performance hit of the SQL query once, in case the file tree of the current user has changed (instead of on every photos app load), but we wouldn't have the overhead of adding extra db tables and a file system hook.

@skjnldsv What do you think?

@marcelklehr
Copy link
Member Author

/compile amend /

@marcelklehr
Copy link
Member Author

@icewind1991 Thanks for getting back to me on this :) Do you think my current approach (cached file cache search based on user folder Etag) makes sense (especially performance wise)? How would you recommend to handle this?

@skjnldsv
Copy link
Member

Looks great now @marcelklehr :)
I don't know where we should document this though. Having .nomedia is super technical. The best would be to have a dedicated way to select paths to exclude or #141.
Documenting would maybe makes it a bit too visible? Any opinions?

@marcelklehr
Copy link
Member Author

Documenting would maybe makes it a bit too visible? Any opinions?

We could add a note to the README, maybe, but I wouldn't add it to the app UI

@szaimen
Copy link
Contributor

szaimen commented Jan 28, 2022

/rebase

I fear you need to rebase manually:

git pull --rebase origin master

and then

git push --force-with-lease

@marcelklehr
Copy link
Member Author

@szaimen party time 🎉 😆

@marcelklehr
Copy link
Member Author

Do you want me to fix the psalm errors?

@marcelklehr
Copy link
Member Author

ping @skjnldsv

@marcelklehr
Copy link
Member Author

/compile amend /

Copy link
Member

@skjnldsv skjnldsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

for Timeline and Tags

fixes #234

Signed-off-by: Marcel Klehr <[email protected]>
@ntrp
Copy link

ntrp commented Apr 26, 2022

How I can see in which release is this coming? I was waiting for something like this since 2 years *_*

EDIT: I guess v24, I have just seen the milestone

@ParaplegicRacehorse
Copy link

Are these (.noimage/.nomedia) supposed to be files resident in the filesystem, or folder tags?

@marcelklehr
Copy link
Member Author

Simply create an empty file in the file system

@ronny332
Copy link

ronny332 commented Jun 20, 2022

does not work for me. has the files system wie crawled again after putting the file in the folder? I can't find the point where the scanFolder() method from AlbumsController.php runs to find out if the file exists or not. It seems not to be started at any time.

don't know why, but started working after running some extra occ files:scan (runs by cron together with preview:generate may times a day). perfect 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2. developing Work in progress enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.nomedia / .noimage file not respected within "Your photos"
7 participants