-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow characters in the emoji tag sequence in file names #1899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there is an inherent conflict between the desire to restrict file names to promote interoperability, and the desire to allow anything in file names to promote the expressivity of content authors. But if you name a file 🏴.opf, not all reading systems will be able to handle your EPUB.
I wonder whether this warning should not be included in the text (as a note, of course). |
But then shouldn't we just make this list of characters to avoid best practice? I know we're trying to help authors avoid interop problems, but if that's critical then stick to the printable ascii character set. |
This is something I wonder if we need to actually test before making any concrete comments on, because from my experience, at least our system falls over when we get just into special characters like punctuation in file names, let alone emojis. |
Right, it just feels like we're doing something a bit tangential to EPUB itself - namely working out how all operating systems and applications will handle unicode. We seem to have issues with this list every revision. |
We discussed this issue in the Internationalization Working Group Teleconference yesterday, and we had a few concerns. First, a lot of Variation Selectors are excluded. Some of them are used in ideographic variation sequences and should be allowed. Second, there are unassigned code points in the tags block and they should not be excluded. We recommend either deleting this line (i.e., allowing all Tags and Variation Selectors Supplement code points including the deprecated characters) or only excluding the deprecated characters (i.e., |
For now, I've changed this to:
|
Looks good to me. Thank you! |
The issue was discussed in a meeting on 2021-11-19 List of resolutions: View the transcript2. Allow characters in the emoji tag sequence in file names (pr epub-specs#1899)See github pull request epub-specs#1899. Dave Cramer: historically epub has been focused on interop and we've had some limits on characters allowed in file names. Matt Garrish: wasn't just emoji characters, it was languages with variation selectors (e.g. mongolian script). Dan Lazin: seems like the primary concern here is that we want to support this, but we're wary that its not supported today and we don't want to give authors bad advice.
Ivan Herman: how would we test this?. Dan Lazin: in practice i think RS will support UTF-8 or not, but expecting that UTF-8 support will exist throughout the store ecosystem and legacy readers is hard. Matt Garrish: are we restricting this because these file names might not work in certain North American stores? Can't the stores themselves decide their own policies for what they will allow?.
Dave Cramer: agree, let's let unicode be unicode. Romain Deltour: there are quite a few standards/api that define file names, some only exist as editor drafts or cg documents.
Rick Johnson: we seem to be saying this is a supply chain issue, can we pass this over to the business group? Meanwhile we let unicode be unicode. Avneesh Singh: after getting such nice feedback from i18n, I think this is a sign that we should not be restrictive here. Maybe a note that these characters are now allowed, but that some RS may not support it. At least for this revision.. Matt Garrish: i'm almost positive that there's a note about zip tools that authors should stay within the ASCII range. Murata Makoto: mgarrish where is this note you just referred to?. Matt Garrish: it's bottom of 6.1.3. Dave Cramer: i think we should merge the PR. This part is uncontroversial. It satisfies i18n and keeps with our philosophy. Matt Garrish: or can we just expand the existing note we were talking about just now?. Dan Lazin: i think we need a note that says caution when using unicode characters. Murata Makoto: i don't like that. It discourages non-ASCII characters. Dan Lazin: i want to encourage the use, just not sure it is safe to do so today. Murata Makoto: i've heard that argument for 20 years, haha. That argument endangers the use of non-ASCII characters. Matt Garrish: if we change the restriction from MUST NOT to SHOULD NOT, would that work MURATA?. Murata Makoto: this issue is about emoji characters, if we start talking about non-ASCII we may be opening a can of worms. Matt Garrish: i think the issue is just about what is allowed in file naming, and how restrictive the spec should be.
Dave Cramer: can we re-write above to avoid referring to US ASCII?. Murata Makoto: this issue is about emoji only, so why are we writing a note about non-ASCII.
Dan Lazin: #1899 just changes "don't use wide range of unicode" to "don't use the two deprecated characters".
Dave Cramer: mgarrish do you want to try to reword that note a little?. Matt Garrish: we'll open an issue about this note?. Dave Cramer: yes, please. |
Fixes #1885
Preview | Diff