Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize file name before existence check in scanner #29605

Merged
merged 3 commits into from
Nov 19, 2021

Conversation

PVince81
Copy link
Member

@PVince81 PVince81 commented Nov 9, 2021

The scanner would not find a NFD-encoded file name in an
existing file list that is normalized.

This normalizes the file name before scanning.

Fixes issues where scanning repeatedly would make NFD files flicker in
and out of existence in the file cache both with compatibility mode on and off.

Fixes #29603

TETS: For a quick test with "Local" storage, you need to remove the "isLocal" check on this line: https://github.com/nextcloud/server/blob/master/lib/private/legacy/OC_Util.php#L244

@PVince81
Copy link
Member Author

PVince81 commented Nov 9, 2021

maybe we could fire another event when we find a file that doesn't match normalization, the scan command can then listen to that one ?

@icewind1991
Copy link
Member

I guess we can move the compatibility warning to the same place (compare $file before and after)

Other than that I think this should be fine

@PVince81 PVince81 added 3. to review Waiting for reviews and removed 2. developing Work in progress labels Nov 10, 2021
@PVince81
Copy link
Member Author

I've now moved the file name encoding check to the scanner and am firing an event to trigger the warning, but only doing so if the encoding wrapper is not applied.

Fixes #29438 as well

@icewind1991
Copy link
Member

Thinking about this again, it probably makes more sense to add normalization to the output of Encoding::opendir. That way the scanner can have it's NFD check without having to special case the compatibility wrapper.

@PVince81
Copy link
Member Author

Thinking about this again, it probably makes more sense to add normalization to the output of Encoding::opendir. That way the scanner can have it's NFD check without having to special case the compatibility wrapper.

this would only work if the wrapper is applied
it would bring back the bug that rescanning twice will make entries appear and disappear again, see #29603

@icewind1991
Copy link
Member

I mean also doing the normalization in the scanner and show the warning the filename isn't normalized when it reaches the scanner (and we should maybe also just ignore it after showing the warning?)
That way the scanner doesn't have to know about any compatibility layers (just that non normalized file names are bad), and enabling the compatibility wrapper will correctly silence the warning

@PVince81
Copy link
Member Author

ok, so the way I understand, after your proposal it would work like this:

  1. without encoding wrapper (compat mode off): the scanner sees the NFD encoded name and compares it with the normalized one, then triggers the new event that will display the warning

  2. with encoding wrapper (compat mode on): Encoding::opendir() will pre-normalize all the directory entries, so that when the scanner sees a file name, it will already be normalized so no warning will be triggered

@PVince81
Copy link
Member Author

@icewind1991 I'm looking at DirectoryWrapper but wrapSource() is protected.

How should I proceed to provide a normalized list from the opendir handle in Encoding::opendir() ?

The only implementation I found is DirectoryFilter but that one is for filtering, not pre-processing entries

@icewind1991
Copy link
Member

It's probably easiest to first collect all files in an array, normalize them, and use IteratorDirectory to return it

The scanner would not find a NFD-encoded file name in an
existing file list that is normalized.

This normalizes the file name before scanning.

Fixes issues where scanning repeatedly would make NFD files flicker in
and out of existence in the file cache.

Signed-off-by: Vincent Petry <[email protected]>
The encoding check for file names is now happening the Scanner, and an
event will be emitted only if the storage doesn't contain the encoding
compatibility wrapper.

The event is listened to by the occ scan command to be able to display a
warning in case of file name mismatches when they have NFD encoding.

Signed-off-by: Vincent Petry <[email protected]>
@PVince81 PVince81 force-pushed the bugfix/29603/fix-scanner-nfd-existenz branch from d862cdc to 3533cab Compare November 17, 2021 08:24
@PVince81
Copy link
Member Author

@icewind1991 I had to touch 3 code paths:

  • getMetaData
  • getDirectoryContents
  • opendir

the scanner is using getDirectoryContents
to test the new EncodingDirectoryWrapper I temporarily copy-pasted Common::getDirectoryContents() into the wrapper to make it use opendir

@PVince81
Copy link
Member Author

ah, and as per your advice the scanner is now skipping non-normalized names when the encoding wrapper is not in place
this means that if one disabled encoding and rescans, the entries will disappear, which looks like acceptable behavior for me

I just hope that there won't be side effects with other kinds of weird files (non-NFD) that get cleant by the normalization.
To be safer we could also use \OC_Util::normalizeUnicode() directly instead of normalizePath.
But the likeliness of seeing strange backslashes or dots in a directory listing is probably extremely low.

@PVince81
Copy link
Member Author

PVince81 commented Nov 17, 2021

  • update unit tests of the wrapper

Directory entry file names are now normalized in getMetaData(),
getDirectoryContents() and opendir().

This makes the scanner work properly as it assumes pre-normalized names.

In case the names were not normalized, the scanner will now skip the
entries and display a warning when applicable.

Signed-off-by: Vincent Petry <[email protected]>
@PVince81 PVince81 force-pushed the bugfix/29603/fix-scanner-nfd-existenz branch from 3533cab to c92a0e4 Compare November 17, 2021 08:45
Copy link
Member

@icewind1991 icewind1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@PVince81 PVince81 merged commit aeb67f2 into master Nov 19, 2021
@PVince81 PVince81 deleted the bugfix/29603/fix-scanner-nfd-existenz branch November 19, 2021 14:29
@PVince81
Copy link
Member Author

/backport to stable23

@PVince81
Copy link
Member Author

/backport to stable22

@PVince81
Copy link
Member Author

/backport to stable21

@pzzszk
Copy link

pzzszk commented Dec 16, 2021

Perfect - Fixed in 23.0.0. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NFD file names appear and disappear repeatedly when scanning
4 participants