Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(files): handle multidimensional arrays in scanner #43794

Merged
merged 1 commit into from
May 13, 2024

Conversation

joshtrichards
Copy link
Member

@joshtrichards joshtrichards commented Feb 23, 2024

Summary

array_diff_assoc() doesn't support multidimensional arrays on its own (and, when attempted, it internally casts any embedded array elements to strings to attempt to compare them - not only generating warnings like "Array to string conversion" but also then overlooking differences).

But sometimes we're passing it a multidimensional array.

I think the reason this became a new problem in v28 is because the metadata (#40761 / etc) gets embedded as a second level array here. It happens that triggering a scan on an Object Store is one way to trigger this code path just right to see the behavior found in #43408. (In other words: there may be other cases where an n-level array ends up here, but I don't know of one off-hand; and this should accommodate any of them in any case).

TODO

Checklist

@solracsf
Copy link
Member

solracsf commented Feb 23, 2024

Another potential approach: iterating over each element in the $data array and then compare each nested element individually, instead of serialize/unserialize (not tested though, may need some adjustements).

$newData = [];
foreach ($data as $key => $value) {
    if (is_array($value)) {
        if (!array_key_exists($key, $cacheData) || !is_array($cacheData[$key])) {
            $newData[$key] = $value;
        } else {
            $nestedDiff = [];
            foreach ($value as $nestedKey => $nestedValue) {
                if (!array_key_exists($nestedKey, $cacheData[$key]) || $cacheData[$key][$nestedKey] !== $nestedValue) {
                    $nestedDiff[$nestedKey] = $nestedValue;
                }
            }
            if (!empty($nestedDiff)) {
                $newData[$key] = $nestedDiff;
            }
        }
    } else {
        if (!array_key_exists($key, $cacheData) || $cacheData[$key] !== $value) {
            $newData[$key] = $value;
        }
    }
}

This comment was marked as resolved.

@skjnldsv skjnldsv added 2. developing Work in progress 3. to review Waiting for reviews and removed 2. developing Work in progress labels Feb 23, 2024
Copy link

@lavigne958 lavigne958 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this fix on my instance and it works. no more error logs from array_diff_assoc.

@skjnldsv skjnldsv removed their request for review February 24, 2024 10:24
Copy link
Contributor

@artonge artonge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation and lengthy explanation. It makes sense IMO. :)

@joshtrichards

This comment was marked as outdated.

This comment was marked as outdated.

@solracsf

This comment was marked as outdated.

@joshtrichards joshtrichards force-pushed the fix/files/scanner-metadata-diff branch from ecd52e6 to e70375b Compare March 1, 2024 22:14
@hyoretsu

This comment was marked as abuse.

@come-nc

This comment was marked as resolved.

@icewind1991
Copy link
Member

array-diff-uassoc can be used to customize how the comparison is done.

I'm not a fan of round-tripping data through serialize if we can avoid it

This was referenced Mar 12, 2024
@skjnldsv skjnldsv added this to the Nextcloud 30 milestone Mar 28, 2024
@ahcm
Copy link

ahcm commented Apr 3, 2024

Sorry, if this in the wild, I just skimmed through the code.

Is it really worthwhile to only updating the diff?
Whatever DB is behind will do that probably anyways or be faster writing all.

@compuguy

This comment was marked as outdated.

@DaanSelen
Copy link

Is this patch slated to release for version 30 or any minor patches for 29?

@joshtrichards joshtrichards force-pushed the fix/files/scanner-metadata-diff branch from 0c018d5 to 6ff46ab Compare May 4, 2024 18:46
@joshtrichards
Copy link
Member Author

Reimplemented w/o serialization. Tests reasonably in my test bed. Mostly housekeeping items left.

Is this patch slated to release for version 30 or any minor patches for 29?

It'll likely make it into the next monthly maintenance releases for v27/v28/v29 (well before v30 is formally published).

@joshtrichards joshtrichards requested a review from artonge May 5, 2024 02:06
@joshtrichards joshtrichards self-assigned this May 5, 2024
lib/private/Files/Cache/Scanner.php Outdated Show resolved Hide resolved
lib/private/Files/Cache/Scanner.php Outdated Show resolved Hide resolved
Copy link

@lavigne958 lavigne958 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been running on my instance for the past 12h and not a single line of error has raised, it works fine 👍

@joshtrichards
Copy link
Member Author

/backport to stable29

@joshtrichards
Copy link
Member Author

/backport to stable28

lib/private/Files/Cache/Scanner.php Fixed Show fixed Hide fixed
lib/private/Files/Cache/Scanner.php Fixed Show fixed Hide fixed
@joshtrichards joshtrichards force-pushed the fix/files/scanner-metadata-diff branch from 1c515bf to 3c3e45f Compare May 8, 2024 12:36
@joshtrichards joshtrichards requested a review from come-nc May 8, 2024 13:41
@joshtrichards joshtrichards changed the title fix(files): Change how scanner diffs for changed data fix(files): handle multidimensional arrays in scanner May 9, 2024
@joshtrichards joshtrichards requested a review from kesselb May 9, 2024 22:01
@come-nc come-nc merged commit e68544d into master May 13, 2024
157 checks passed
@come-nc come-nc deleted the fix/files/scanner-metadata-diff branch May 13, 2024 09:05
@blizzz blizzz mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet