Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxCodePoints / minCodePoints (UTF-32 code points) #875

Open
tats-u opened this issue Oct 17, 2024 · 7 comments · May be fixed by #888
Open

maxCodePoints / minCodePoints (UTF-32 code points) #875

tats-u opened this issue Oct 17, 2024 · 7 comments · May be fixed by #888
Assignees
Labels
enhancement New feature or request

Comments

@tats-u
Copy link

tats-u commented Oct 17, 2024

The length limit of VARCHAR in some RDBs is the number of UTF-32 code points.
maxLength counts an emoji and some kanji as two.

Password requirements by NIST:

https://pages.nist.gov/800-63-3/sp800-63b.html

Unicode [ISO/ISC 10646] characters SHOULD be accepted as well. To make allowances for likely mistyping, verifiers MAY replace multiple consecutive space characters with a single space character prior to verification, provided that the result is at least 8 characters in length. Truncation of the secret SHALL NOT be performed. For purposes of the above length requirements, each Unicode code point SHALL be counted as a single character.

This requires we should count an emoji (not compounded ones) or other 4-byte chracters as 1 character in a password.

@tats-u tats-u changed the title maxCodePoints / minCodePoints (UTF-32) maxCodePoints / minCodePoints (UTF-32 code points) Oct 17, 2024
@fabian-hiller
Copy link
Owner

fabian-hiller commented Oct 17, 2024

@fabian-hiller fabian-hiller self-assigned this Oct 17, 2024
@fabian-hiller fabian-hiller added the question Further information is requested label Oct 17, 2024
@tats-u
Copy link
Author

tats-u commented Oct 17, 2024

@fabian-hiller

new grapheme actions

The number of UTF-16/32 code points per grapheme is unlimited. You should combine maxGraphemes with this maxCodePoints or maxLength.

https://stackoverflow.com/questions/71011343/maximum-number-of-codepoints-in-a-grapheme-cluster

@tats-u
Copy link
Author

tats-u commented Oct 17, 2024

If you write your backend in Go or Rust, UTF-32 length is commoner than UTF-16. (utf8.RuneCountInString(str) or str.chars().count())

@fabian-hiller
Copy link
Owner

Thank you for your detailed feedback! How would you implement such an action? We also have byte actions like maxBytes but not sure if this is what you are looking for.

@tats-u
Copy link
Author

tats-u commented Oct 19, 2024

We can implement it based on the existing maxLength. Compare the result of codePointAt per character with 0x10000 and move the cursor forward by one more character if necessary.

You can combine maxBytes with others too. For a password, it can't be longer than 72 bytes if you hash it by bcrypt. It's compatible with maxCodePoints or maxLength.

@fabian-hiller
Copy link
Owner

Can you provide a code example for the if-statement to check the maximum code points?

@tats-u tats-u linked a pull request Oct 20, 2024 that will close this issue
@tats-u
Copy link
Author

tats-u commented Oct 20, 2024

Do you mean this?

const codePoint = input.codePointAt(i)!;
if (codePoint <= 65535) {
i++;
} else {
i += 2; // 2 characters (surrogate pair) in JS (UTF-16)
}
count++;

@fabian-hiller fabian-hiller added enhancement New feature or request and removed question Further information is requested labels Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants