Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the packaging spec explicitly disallow file name characters? #35

Closed
llemeurfr opened this issue Feb 2, 2019 · 4 comments
Closed

Comments

@llemeurfr
Copy link
Contributor

The OCF specification lists a series of characters which cannot be used in file paths and file names.

The ISO 21320 specification, on which the WP packaging format will be based, does not formally prohibits any character for use in file paths / names. Instead, in an informative annex, it expresses that [The ZIP] "Appnote specifies few restrictions for filenames in the archive. For compatibility, this part of ISO/IEC 21320 does not require additional restrictions on filenames which are valid according to Appnote."
Then it shows as "knows restrictions" in a comparative table what JAR, Widget Zip, OOXML, OCF and Adobe UCF do prohibit.

The ZIP Appnote has a field reserved for "file name" (4.4.17). It states that "The path stored MUST NOT contain a drive or device letter, or a leading slash. All slashes MUST be forward slashes '/' as opposed to backwards slashes '' for compatibility with Amiga and UNIX file systems etc. ".
I also found Appendix D.2, which states that "the ZIP format has historically supported only the original IBM PC character encoding set, commonly referred to as IBM Code Page 437". But there is also a way to storie a Unicode Path in UTF-8 in either an "extra field" (4.6.9) or the original field.

Conclusion: in this Appnote there is no mention of disallowed characters in file paths / names.

Therefore, we can't rely on ISO 21320 to limit the characters allowed in file paths / names. Either we keep (extend?) the OCF constraints, or we rely on authors (and operating systems) to avoid file path / names which would break interoperability between systems. After all this is what we do when choosing a file name; I can name a file [{"'!§$€%.x on MacOS: would I try to move it to a Windows or Linux machine?

@iherman
Copy link
Member

iherman commented Feb 3, 2019

I am sympathetic to pragmatism. It is good to be precise in the spec, but we have to be careful not to cast in concrete features that would evolve around us. We may impose restrictions that would not be relevant in a few years (so people will ignore them, in fact).

That being said: having some sort of informative reference and guidance would be good. We could/should refer to the OCF document informally if that serves the purpose. We could also consider referring to the URL specification in terms of path names, too (e.g., the WhatWG URL spec or IRI spec). Indeed, if a packaged WP has to be used on the Web, then the usage of the file names as part of URL-s are also something to consider.

To be honest, I am at loss myself on that matter: i.e., whether the URL-s conflict or not with the OCF spec, etc. I know that I have a liberty to use crazy filenames on a Mac, I have no idea what the situation is on recent Windows, on Chrome, on Linux, etc. I must admit (but possibly because I have gray hairs) that when I create files that are supposed to be used as part of URLs, for example, I still restrict myself to ASCII...

@llemeurfr
Copy link
Contributor Author

The LPF draft contains the following note:
The [ZIP] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors must be careful to use characters which allow a broad interoperability among operating systems and are compatible with relative URLs.

@llemeurfr
Copy link
Contributor Author

agreed to replace "must" be "should" (non normative).

@iherman
Copy link
Member

iherman commented May 21, 2019

This issue was discussed in a meeting.

  • RESOLVED: The [ZIP] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors should be careful to use characters which allow a broad interoperability among operating systems and are compatible with relative URLs.
View the transcript Should the packaging spec explicitly disallow file name characters?
Laurent Le Meur: #35
Laurent Le Meur: Lets start with #35. The issue is about file and characters to disallow explicitly (or not) in a specification. There were few comments and I put in the draft a sentence that’s in the last comment of the thread. I feel it is sufficient.
… if there are no unhappy comments, we can close the issue. The resolution would be to just quote the zip specifications - which has only a few constraints on the file and directory names - no use of explicit characters.
Nick Ruffilo: +1
Garth Conboy: That looks fine to me, and I presume that ‘must’ probably should be a should in what you wrote. It’s not a normative statement either way.
Proposed resolution: The [ZIP] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors should be careful to use characters which allow a broad interoperability among operating systems and are compatible with relative URLs. (Laurent Le Meur)
Nick Ruffilo: +1
Luc Audrain: +1
Wendy Reid: +1
Garth Conboy: +1
Geoff Jukes: +1
Tim Cole: +1
Marisa DeMeglio: +1
Resolution #2: The [ZIP] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors should be careful to use characters which allow a broad interoperability among operating systems and are compatible with relative URLs.
Laurent Le Meur: I will modify the draft then close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants