-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibly expensive SQL Query on determining files to process #144
Comments
Hi, welcome back :D Thanks for the detailed description and all the information. This will certainly help a lot. In any case, I agree that the app should be more performant than it seems to be now, even with so many files. Due to a public holiday tomorrow and the following bridge day (for most people), this will probably have to wait a few days. But I'll get straight to have a look on it on Monday. |
You're Welcome If you need some "performance tests" I'm willing to run some SELECT statements against my DB Instance providing results. Enjoy your days off :-) |
Hey - I did some further experiments and analysis. It looks like the problematic part of the query is the LEFT JOIN with CAST STATEMENT. MariaDB can not utilize the indexes therefor. But when I create corresponding data types and indexes in the relevant
Problematic with this is that I've touched the nextcloud standard which could be problematic on updates/upgrade - maybe something similar can be achieved with an own GData VaaS mapping table exclusively track the files processed. Btw ... As you see on the result, the query above gives back the fileid multiple times, when the file has multiple tags not belonging to GData VaaS. This is true for my query as for the problematic one. I therefor further "optimized" mine with SELECT DISTINCT
|
It appears that we don't have to add new columns or indices. The current query is not filtering on oc_systemtag_object_mapping.objecttype. The composite index (objecttype, objectid, systemtagid) is not used. If you find the time, it would be great to know, if the new query is faster than 1s on your system. We will release the fix in the coming days. Test data
Current querySELECT
`f`.`fileid`
FROM
`oc_filecache` `f`
LEFT JOIN `oc_systemtag_object_mapping` `o` ON `f`.`fileid` = CAST(o.objectid AS UNSIGNED)
LEFT JOIN `oc_mimetypes` `m` ON `f`.`mimetype` = `m`.`id`
WHERE
(
(
`o`.`systemtagid` NOT IN (248, 249, 251, 250, 252)
)
OR (`o`.`systemtagid` IS NULL)
)
AND (`m`.`mimetype` NOT LIKE '%unix-directory%')
AND (
(`f`.`path` LIKE 'files/%')
OR (`f`.`path` LIKE '__groupfolders/%')
)
ORDER BY
`f`.`fileid` DESC
LIMIT
10000; Runtime: Aborted after 18min EXPLAIN:
New querySELECT
`f`.`fileid`
FROM
`oc_filecache` `f`
LEFT JOIN `oc_systemtag_object_mapping` `o` ON `f`.`fileid` = CAST(o.objectid AS UNSIGNED)
LEFT JOIN `oc_mimetypes` `m` ON `f`.`mimetype` = `m`.`id`
WHERE
(
(
`o`.`systemtagid` NOT IN (248, 249, 251, 250, 252)
)
OR (`o`.`systemtagid` IS NULL)
)
AND (o.objecttype = 'files')
AND (`m`.`mimetype` NOT LIKE '%unix-directory%')
AND (
(`f`.`path` LIKE 'files/%')
OR (`f`.`path` LIKE '__groupfolders/%')
)
ORDER BY
`f`.`fileid` DESC
LIMIT
10000; Runtime: 800ms EXPLAIN:
|
Optimize getFileIdsWithoutTags, getFileIdsWithTags
Optimize getFileIdsWithoutTags, getFileIdsWithTags
Still got some issues. The SQL query is not correct. Will update once I get it running. |
Cast fileid to objectid and use the index
String type for all supported DBs
Possibly expensive SQL Query on determining files to process #144 Optimize getFileIdsWithoutTags, getFileIdsWithTags --------- Co-authored-by: Philip Stadermann <[email protected]> Co-authored-by: Lennart Dohmann <[email protected]>
Should be solved with e266783 in the next release. |
Possibly expensive SQL Query on determining files to process #144 Optimize getFileIdsWithoutTags, getFileIdsWithTags --------- Co-authored-by: Philip Stadermann <[email protected]> Co-authored-by: Lennart Dohmann <[email protected]> (cherry picked from commit e266783)
Fix cast for MySQL (cherry picked from commit 7d8ae58)
Possibly expensive SQL Query on determining files to process #144 Optimize getFileIdsWithoutTags, getFileIdsWithTags --------- Co-authored-by: Philip Stadermann <[email protected]> Co-authored-by: Lennart Dohmann <[email protected]> (cherry picked from commit e266783)
Fix cast for MySQL (cherry picked from commit 7d8ae58)
Can confirm the fix - didn't see any entries anymore in the slow log for the problematic query after latest release. |
Description
Hey it's me again :-D
I've recognized a high load on my Maria DB Instance. It took me a while to identify what cause the load but the MariaDB slow log is very clear about the SQL Statement and I suspect to trace it down to the GData VaaS App.
It looks like the GData VaaS makes an uncomfortable SQL Query to the Database not performing well.
Reproduce
Install Nextcloud
Install gdatavaas App
Possibly have a lot of files + folder in your nextcloud and/or oc_filecache table - I count:
~293491
in my nc/oc data dir (overall)~27351
in my (sinlge) userdir~265999
in my appdata folder (including previews)~ 349134
in oc_filecache tableHardware simple NAS with:
For sure no enterprise grade hardware but also no numbers of files like enterprises have.
Log(s) & Analysis
Not an MariaDB/SQL Expert but it looks like for the first part of the query with
oc_filecache
table no index can be utilized and endup in a expensive full table scanoc_filecache
is able to utilize indexDatabase Stats
Some statistic data might help for the issue
oc_filecache
(f)oc_systemtag_object_mapping
(o)oc_mimetypes
(m)Available Indices
oc_filecache
As other apps (memories) maintain there own indices, this might be also a solution for GData VaaS - if some is missing/can not utilized
oc_mimetypes
oc_systemtag_object_mapping
Expected behavior
Versions
The text was updated successfully, but these errors were encountered: