Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading files with Arabic file names and titles are converted to digits #624

Closed
2 of 3 tasks
pjayme opened this issue Aug 5, 2024 · 8 comments
Closed
2 of 3 tasks

Comments

@pjayme
Copy link

pjayme commented Aug 5, 2024

Module version(s) affected

2.2.4

Description

When files with Arabic filenames are uploaded into the CMS, for instance الطالب لعائلتك.docx then the Name and the Title of the File record is converted from Arabic characters to digital characters which is an inaccurate representation of the file:
Screen Shot 2024-08-05 at 2 16 19 PM

How to reproduce

  1. Get a sample file which features Arabic characters in its file name. For instance الطالب لعائلتك.docx.
  2. In the CMS, go to the "Files" area and upload the file mentioned in the previous step
  3. When this file is uploaded, you should see the Filename change from الطالب لعائلتك.docx to something which features digital characters, for instance 66b034bf7ca15.docx.
    The title reflects that changed/sanitised filename, but should ideally reflect the original filename.

Possible Solution

No response

Additional Context

No response

Validations

  • Check that there isn't already an issue that reports the same bug
  • Double check that your reproduction steps work in a fresh installation of silverstripe/installer (with any code examples you've provided)

Acceptance Criteria

  • The title of the file reflects the original filename as it was before being modified by the filename filter.

PRs

@emteknetnz
Copy link
Member

emteknetnz commented Aug 6, 2024

This is intended behavior https://github.com/silverstripe/silverstripe-assets/blob/2/src/FileNameFilter.php#L18

 * The default sanitizer is quite conservative regarding non-ASCII characters,
 * in order to achieve maximum filesystem compatibility.
 * In case your filesystem supports a wider character set,
 * or is case sensitive, you might want to relax these rules
 * via overriding {@link FileNameFilter_DefaultFilter::$default_replacements}.

You can override the defaults as follows using yml config:

---
Name: myproject-filenamefilter-reset
---
SilverStripe\Assets\FileNameFilter:
  # Reset default replacements so that we do not merge with existing
  default_replacements: null
---
Name: myproject-filenamefilter
---
SilverStripe\Assets\FileNameFilter:
  default_replacements:
    # remove whitespace
    /\s/: '-'
    # remove duplicate underscores (since `__` is variant separator)
    /_{2,}/: '_'
    # remove duplicate dashes
    /-{2,}/: '-'
    # Remove all leading dots, dashes or underscores
    /^[-_\.]+/: ''
    # remove non-ASCII chars, only allow alphanumeric plus dash, dot, and underscore (disabled)
    # /[^-_A-Za-z0-9+.]+/: ''

I cannot give any guarantee how this will behave in a production environment, and whether or not it causes any issues

@satrun77
Copy link
Contributor

@GuySartorelli @emteknetnz would you accept a PR that would prevent the title field from taking the modified file name 66b034bf7ca15? If yes can open this issue or create a new one?

@GuySartorelli
Copy link
Member

@satrun77 What would that look like? And have you tried using the configuration steve mentioned above? If you have, and it didn't resolve the problem, can you please describe what is stopping that config from resolving this problem for you?

@satrun77
Copy link
Contributor

@GuySartorelli The configuration controls what characters used for the file name transformation. I'm ok with that, I understand why it is there. What I'm asking for is not about the file name but the title field of the File object. Instead of using the new file name for the title, we could use the actual file name for the title. Then the title field would be الطالب لعائلتك and File name can stay 66b034bf7ca15.docx

@GuySartorelli
Copy link
Member

GuySartorelli commented Aug 21, 2024

Ahh, that makes sense. So you want the title to reflect the original file name, regardless of what the new file name is. I think that's sensible. I'll reopen this issue and update the title and description.

@GuySartorelli GuySartorelli reopened this Aug 21, 2024
@GuySartorelli GuySartorelli changed the title Uploading files with Arabic file names are converted to digits Uploading files with Arabic file names and titles are converted to digits Aug 21, 2024
@GuySartorelli
Copy link
Member

Done, and added an acceptance criterion. If you're keen to do a PR I'd be very happy to review it.

@satrun77
Copy link
Contributor

Done, and added an acceptance criterion. If you're keen to do a PR I'd be very happy to review it.

PR #633

@GuySartorelli
Copy link
Member

PR merged.
This will be included in the October minor release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants