-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve checking of OCF file name characters #1408
Conversation
- significantly refactored `OCFFilenameChecker`: - now based on ICU4J for more up-to-date Unicode support - iterate over the input string codepoints (`int`) instead of code units (`char`) for better support of charactres outside the BMP. - rely on ICU UnicodeSet to define the set of forbidden characters - improve the message reported for forbidden characters (notably for non-printable characters) - the class now implements the `Checker` interface - add unit tests for the `OCFFilenameChecker` only in a new feature file `filename-checker.feature`, located along with the EPUB 3 OCF functional tests, for easier lookup. - add a few functional tests, to make sure `OCFFilenameChecker` is correctly called when checking full publications or single package documents. - update ICU4J to v72.1 Fixes #1240, fixes #1292, fixes #1302
Hm, I can't even check out this branch because of the file names:
Is this going to make the repository unusable on windows if it gets merged? |
ou damn. Are you able to checkout the repo in subsequent commits? |
No, I kept getting the same error messages about the files in the one I tried. |
Can you retry checking out after setting:
I'd like to avoid rewriting the history of released code if possible, even if they are only previews 🤞 |
That lets me check out the branch, yes. git still can't create the files so I can't build epubcheck, but if you plan to fix that in a future update it should be fine. |
good.
If it's fine with you, temporarily, then yes I'd prefer to fix that in a future update. In the mean time, you should be able to build if you either:
Regarding the fix, I'm wondering if packaging the EPUB will be enough, since EPUBCheck unpacks it at runtime, and Windows/NTFS might complain… fun times ahead 😊 |
Ya, that's a recipe for disaster! Do you need to test the physical files? I see there's a similar ocf-filename-character-forbidden-error.opf file performing the same check. Maybe that's enough in this case. |
yes, it's not the same code branch:
Doing only the latter is not optimal since files can be in the container without being listed in the package document. So the tests ideally need to cover both cases. I'll try to find a workaround. If I can't find any, it's not that big a deal to prune the container scenarios, since the file name checker is unit-tested anyway. |
I opened issue #1420 and PR #1421 to fix the test suite. @mattgarrish can you approve this one once you confirm #1421 fixes the checkout+testing issue? |
OCFFilenameChecker
:int
) instead of codeunits (
char
) for better support of charactres outside the BMP.non-printable characters)
Checker
interfaceOCFFilenameChecker
only in a new featurefile
filename-checker.feature
, located along with the EPUB 3 OCFfunctional tests, for easier lookup.
OCFFilenameChecker
iscorrectly called when checking full publications or single package
documents.
Fixes #1240, fixes #1292, fixes #1302