Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can create invalid unicode strings #37

Open
nijel opened this issue Mar 11, 2020 · 0 comments
Open

Can create invalid unicode strings #37

nijel opened this issue Mar 11, 2020 · 0 comments
Labels

Comments

@nijel
Copy link
Member

nijel commented Mar 11, 2020

Many phones use surrogates to encode higher plane unicode chars and this gets passed through Gammu and python-gammu to Python unicode string. The problem is that these are not allowed there, do doing something with such string ends up in:

UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 10316: surrogates not allowed

There has to be some bug in the surrogate conversion code:

/* Convert string without zero at the end. */
*out_len = 0;
for (i = 0; i < len; i++) {
value = (src[2 * i] << 8) + src[(2 * i) + 1];
if (value >= 0xD800 && value <= 0xDBFF) {
second = src[(i + 1) * 2] * 256 + src[(i + 1) * 2 + 1];
if (second >= 0xDC00 && second <= 0xDFFF) {
value = ((value - 0xD800) << 10) + (second - 0xDC00) + 0x010000;
i++;
} else if (second == 0) {
/* Surrogate at the end of string */
value = 0xFFFD; /* REPLACEMENT CHARACTER */
}
}
dest[(*out_len)++] = value;
}

Or there is other way this can slip through. I've seen this in Text as returned by DecodePDU.

@nijel nijel added the bug label Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant