Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrapper for NFD encoding workaround #24349

Merged
merged 5 commits into from
May 23, 2016
Merged

Add wrapper for NFD encoding workaround #24349

merged 5 commits into from
May 23, 2016

Conversation

PVince81
Copy link
Contributor

@PVince81 PVince81 commented Apr 29, 2016

Fixes #21365

Adds storage wrapper "EncodingWrapper" that first tries to find which one of NFC or NFD file exists, and then uses that path to perform operations.

TODO

  • TODO: introduce wrapper
  • TODO: make wrapper opt-in using mount option
  • TODO: unit tests

Bugs

Currently works ok-ish with SFTP, need to be tested with all file operations

  • BUG: cache entries randomly appear/disappear, possibly some code paths bypassing the wrapper
  • BUG: files:scan sometimes makes it disappear from cache
  • BUG: two NFD entries in same folder seem to bug

Tests

  • TEST: download
  • TEST: upload multiple files into NFD folder, refresh folder to see its contents
  • TEST: upload + overwrite file into NFD folder
  • TEST: upload + overwrite file with NFD name (the uploaded file name is NFC)
  • TEST: rename
  • TEST: move to storage subdir
  • TEST: cross-storage move (move out of the ext storage)
  • TEST: copy
  • TEST: propfind
  • TEST: occ files:scan
  • TEST: watcher (touch the files remotely then PROPFIND the folder)
  • TEST: permanent delete (no trashbin)
  • TEST: delete to trash
  • TEST: restore from trash
  • TEST: versions creation
  • TEST: version download
  • TEST: version restore
  • TEST: encryption
  • TEST: work inside folder that has a NFD name

External storages

  • SFTP
  • SMB
  • ...

@icewind1991 @DeepDiver1975 @nickvergessen FYI

@PVince81
Copy link
Contributor Author

PVince81 commented May 2, 2016

More progress. More file operations work now and the option is opt-in in the UI now.

@PVince81
Copy link
Contributor Author

PVince81 commented May 3, 2016

  • BUG: delete a file within a folder containing a "bad ümlaut", the file disappears after deletion (possibly rogue rescan)

@PVince81
Copy link
Contributor Author

PVince81 commented May 3, 2016

Arghh, another folder listing diff that needs NFD support: https://github.com/owncloud/core/blob/v9.0.1/lib/private/files/cache/scanner.php#L397

@PVince81
Copy link
Contributor Author

PVince81 commented May 3, 2016

  • TEST: work in folder name like "/nfd_name/nfc_name/nfd_name"

Will not work with current logic that transforms the whole path to NFC/NFD). Need an algo that goes through all path sections and calls file_exists there. 😞

@PVince81
Copy link
Contributor Author

PVince81 commented May 3, 2016

  • BUG: cannot upload into a folder with NFD name

Reason is the same as above. Need to change findPathToUse to do a file_exists on every section.

@PVince81
Copy link
Contributor Author

PVince81 commented May 3, 2016

Added some unit tests, might not be enough to cover all.

* @return string original or converted path
*/
private function findPathToUse($path) {
if ($path !== '' && !$this->isAscii($path)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the whole function would save indentation if you said

if  (path == '' || $this->isAscii($path) ) {
   return $path;
}

Wouldn't that be nicer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And: The isAscii funciton initializes a regexp engine which is always not cheap.

Without having investigated details but don't you think a simple for loop over the string checking for non ascii chars is runtime-cheaper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got plenty of indentation in stock so no need to save them 😉

Yeah, good point. I'll change it next time I touch this good, which is likely to be soon. The algo isn't adequate because every path sections needs its own NFC/NFD trial...

@PVince81
Copy link
Contributor Author

PVince81 commented May 4, 2016

  • TODO: optimize isAscii

@dragotin
Copy link
Contributor

dragotin commented May 4, 2016

On a more general note, and sorry if my comments miss the facts, I haven't investigated too deeply: The pathToUse() function which does the costly conversion is called in most of the low level functions. That looks like it can happen multiple if code like

if( file_exists($path) ) {
   remove_file($path); 
}

is done - that would results in calling the conversion function twice maybe. That can eat performance.

The big gun would of course be a class Path that has the knowledge about if the encoding was checked already.

@PVince81
Copy link
Contributor Author

PVince81 commented May 4, 2016

@dragotin actually the expensive lookup for each path is only done once, the result is saved in the $namesCache array which should prevent redoing the lookup several times.

@PVince81
Copy link
Contributor Author

PVince81 commented May 18, 2016

@dragotin I fixed isAscii to be simpler and saved some indenting.

Also: changed the path finding algo by converting each path section one by one to find either the NFD or NFC form. This is required in case of people having crazy paths like "nfd/nfc/nfd".

  • TODO: some unit tests fail

Also need to redo a bit of manual testing as per the steps in the OP.

*
* @param string $fullPath path to check
*
* @return string original or converted path, or null if none of the forms was found
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed fixing this comment

@PVince81
Copy link
Contributor Author

Rebased.
Fixed the unit tests and also fixed many issues with the encoding wrapper itself.

Next up: redo the manual testing to tick all the TEST checkboxes above.

@PVince81
Copy link
Contributor Author

Please review @owncloud/filesystem @nickvergessen @schiesbn

@@ -33,8 +33,12 @@ var MOUNT_OPTIONS_DROPDOWN_TEMPLATE =
' <option value="1" selected="selected">{{t "files_external" "Once every direct access"}}</option>' +
' </select>' +
' </div>' +
' <div class="optionRow">' +
' <input id="mountOptionsEncoding" name="encoding_compatibility" type="checkbox" value="true"/>' +
' <label for="mountOptionsEncoding">{{t "files_external" "Enable encoding compatibility (decreases performance)"}}</label>' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Enable NFD encoding compatibility" or "Enable Mac unicore encoding compatibility" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hesitated about that but you're right. In the end it's only about NFD/Mac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw what is this {{t magic, does our language extraction tool detect that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a template function. Unfortunately our language extraction doesn't detect it, so there's an ugly workaround here: https://github.com/owncloud/core/pull/24349/files/e6f31f2107ed86e88527bb4ac2f60936c0e57e3a#diff-55300aafc99213431ff789e8b0fb70a7R13

I hope one day we can make it work... #15106

return $this->copy($this->findPathToUse($sourceInternalPath), $targetInternalPath);
}

$result = $this->storage->copyFromStorage($sourceStorage, $this->findPathToUse($sourceInternalPath), $targetInternalPath);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$targetInternalPath should be de-normalized, not $sourceInternalPath since the target path is the one on this storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok seems I need to add moar unit tests

@PVince81
Copy link
Contributor Author

I fixed the code based on comments, please have another look @icewind1991 @nickvergessen

$result = $this->storage->rename($this->findPathToUse($path1), $this->findPathToUse($path2));
if ($result) {
unset($this->namesCache[$path1]);
unset($this->namesCache[$path2]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more unneeded unsets (and the 2 below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@PVince81
Copy link
Contributor Author

Went through the test list and all went fine with SMB.
Many of the tests were done where file operations are done inside a NFD folder called "bad-ümlaut", and the second part the operations are done on a NFD file "bad-ümlaut.txt" directly.

@PVince81
Copy link
Contributor Author

Did a quick test with SFTP and it seems to work fine, too 😄

Vincent Petry added 5 commits May 20, 2016 09:33
The encoding wrapper is now only applied when the mount option is set,
disabled by default.
Since new children from the storage might contain NFD entries, these
must be normalized to NFC to be properly diff'ed with the cache
contents which is always NFC.

This fixes an issue where NFD entries would disappear from the cache
after rescannng for children.
Improved label
Fixed rename/copy/moveFromStorage/copyFromStorage and added tests
Improved findPathToUse algo
@PVince81
Copy link
Contributor Author

Rebased again. I hope this PR is acceptable now 😄
@icewind1991 @nickvergessen

@PVince81
Copy link
Contributor Author

Test files are here: #21365 (comment)

@rullzer
Copy link
Contributor

rullzer commented May 23, 2016

👍

1 similar comment
@icewind1991
Copy link
Contributor

👍

@PVince81 PVince81 merged commit bd87f67 into master May 23, 2016
@PVince81 PVince81 deleted the nfd-storagewrapper branch May 23, 2016 11:45
@lock
Copy link

lock bot commented Aug 5, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UTF-8 NFD file name on SMB storage cannot be accessed
5 participants