Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify UTF8 encoding. #141

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

adamoutler
Copy link

This fixes a problem with UTF-16 chars being read as less number of chars than expected which results in an unclosed expression expecting a closing single quote.

Java 1.7 is no longer supported.  Java 11 is currently supported in LTS, and Java 17 will be the new LTS in September.  It's time to bump the version.
This fixes a problem with UTF-16 chars being read as less number of chars than expected which results in an unclosed expression expecting a closing single quote.
@vidstige
Copy link
Owner

vidstige commented Feb 4, 2022

Thanks for your contribution! Can you expand a bit on what is going wrong exactly? Perhaps I can add a unit test case for this as well.

@adamoutler
Copy link
Author

Not needed. This is a best practice and recommended by static analysis tools. The system defaults to whatever the system default is set to. This constraints it to UTF-8 which is recommended. In the case the system is set to UTF-16, emojis fail.

@adamoutler
Copy link
Author

It's preparation for when this repo upgrades to Java 11/16.

@vidstige
Copy link
Owner

Please rebase your branch on top of latest master to trigger a build 🙏

@vidstige
Copy link
Owner

  • There seems to be lot's of unrelated formatting, etc. Please add these in separate commits if needed at all.
  • How wide-spread is Java 11? What other versions would it be reasonable to upgrade to? Keep in mind that this is a library, so requiring too high version will prevent people from upgrading.

@janosvitok
Copy link
Collaborator

I've quickly run over this PR.

I think the only relevant change is the one line with addittion of StandardCharsets.UTF_8. The rest should be reverted.
StandardCharsets is part of Java 7, so there's no need to upgrade.

It's strange that @adamoutler removed tests for strings with diacritics -- that's where UTF-8 is really used.

System.out.println is for temporary debugging and should have been removed prior to commit/PR.

Other than that, I agree that it is safer to explicitly specify encoding.

@adamoutler
Copy link
Author

Yes. I guess I included items which should not have been in there. The UTF-8 part was discovered as the root of the other problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants