Fix infinite loop when using characters from unicode supplementary pl… #828

Ratio2 · 2019-09-22T11:06:41Z

…anes ('🔈' for example)

Fixes # .

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the existing documentation
My changes generate no new warnings
I have updated the change log (Add/Change/Fix)
I have cleaned up the commit history (use rebase and squash)

Changes proposed in this pull request:

Liryna · 2019-09-24T18:36:29Z

Thank you for the PR. There are a couple of changes in your PR probably not all related to the issue.
Can you clean it ?
I imagine you are talking about put_utf8 for the infinite loop ?

…anes ('🔈' for example)

Ratio2 · 2019-09-24T19:46:22Z

@Liryna Make cleanup as you requested.

Liryna · 2019-09-29T18:25:52Z

Hi @Ratio2 ,

I have tried to reproduce your issue with a small sample outside dokan_fuse and could not reproduce the infinite loop.

 unsigned char src[] = "🔈";
 unsigned char dst[16];
 convert_char(get_utf16, put_utf8, src, 2, dst);

Am I doing something different ?

Ratio2 · 2019-09-29T18:30:03Z

@Liryna I reproduce bug in encfs bundle: https://github.com/jetwhiz/encfs4win/releases/download/v1.10.1/encfs-installer.exe, but bug in dokan itself.

Liryna · 2019-09-29T18:31:48Z

Hi @jetwhiz,

Do you have any knowledge about such issue ?

Ratio2 · 2019-09-29T18:36:26Z

@Liryna Bug occured when we convert utf-16 to utf-8. We got 4 bytes, but with invalid value. When we decode this utf-8, only 2 bytes consumed instead 4, second 2 bytes return error with negative result and str len increased (negative cast to big number => undefined behavior) instead reducing.

Liryna · 2019-09-29T18:38:37Z

Do you have sample data (src buffer) that I could use to reproduce ?

Ratio2 · 2019-09-29T18:39:32Z

@Liryna I don't report this issue to encfs team. Just find problem and fix it in dokan ;)

Ratio2 · 2019-09-29T19:02:59Z

@Liryna https://pastebin.com/YzwHM948

Ratio2 · 2019-09-29T19:14:40Z

@Liryna Sorry, my test was bad, i fixed it:

int main()
{
	char utf8[5] = "\xf0\x9f\x94\x88";
	wchar_t utf8_to_utf16[3] = {};
	auto res = convert_char(get_utf8, put_utf16, utf8, 4, utf8_to_utf16);
	assert(res == 4);
	char utf16_to_utf8[5] = {};
	res = convert_char(get_utf16, put_utf8, utf8_to_utf16, 4, utf16_to_utf8);
	assert(res == 4);
	assert(!strcmp(utf8, utf16_to_utf8));
}

Ratio2 · 2019-09-29T19:25:01Z

@Liryna One more simple test:

int main()
{
	wchar_t utf16[3] = L"\xd83d\xdd08";
	char utf8[5] = {};
	auto res = convert_char(get_utf16, put_utf8, utf16, 4, utf8);
	assert(res == 4);
	assert(utf8[0] != 0); // wtf?
}

Liryna · 2019-09-29T20:25:28Z

Thank you very much @Ratio2 ! 🏆

I am not the author of the dokan_fuse. I was wondering if you understood why the author used the condition if (c < 0x40) ?

Ratio2 · 2019-09-29T20:44:22Z

@Liryna
We have unicode character with code 0x1f508: https://www.unicode.org/charts/PDF/U1F300.pdf
In bits ‭11111010100001000‬ - 17 bits.
According to utf-8 we can encode 16 bits to 3 bytes and 21 bits to 4 bytes: https://en.wikipedia.org/wiki/UTF-8#Description
Split to 4 byte utf-8: 000 011111 010100 001000. First byte is zero, but it must be exists, author of original code make mistake assuming that first byte can not be zero and remain garbage in first byte ;)

Ratio2 · 2019-09-29T20:52:13Z

@Liryna Condition if (c < 0x40) check if remain 6 bits or less (max bits per utf-8 byte if encoded more then one).

Liryna · 2019-09-30T18:43:53Z

Thank you @Ratio2 this explanation is very helpful 👍
More special thank for your PR, please do not hesitate if you find anything else 🚀

jetwhiz · 2019-10-01T00:15:38Z

Thanks @Ratio2 and @Liryna !

I wonder if this is related to the issue regarding Thai characters eating up 100% CPU resources? jetwhiz/encfs4win#94 Do Thai characters fall into this category?

Liryna · 2019-10-01T06:28:40Z

Hi @jetwhiz

http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%E0%B8%8B%E0%B9%88&mode=char
utf8 code also starts with a zero so therefore the same case as @Ratio2 reported.
I just made and test and I can confirm. Previous code corrupted the character but with @Ratio2 fix, we get the correct value 👍

Fix infinite loop when using characters from unicode supplementary pl…

bf4979b

…anes ('🔈' for example)

Liryna merged commit 08502af into dokan-dev:master Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix infinite loop when using characters from unicode supplementary pl… #828

Fix infinite loop when using characters from unicode supplementary pl… #828

Ratio2 commented Sep 22, 2019

Liryna commented Sep 24, 2019

Ratio2 commented Sep 24, 2019

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 •

edited

Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 •

edited

Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited by Liryna

Loading

Ratio2 commented Sep 29, 2019 •

edited by Liryna

Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited

Loading

Liryna commented Sep 30, 2019

jetwhiz commented Oct 1, 2019

Liryna commented Oct 1, 2019

Fix infinite loop when using characters from unicode supplementary pl… #828

Fix infinite loop when using characters from unicode supplementary pl… #828

Conversation

Ratio2 commented Sep 22, 2019

Checklist

Changes proposed in this pull request:

Liryna commented Sep 24, 2019

Ratio2 commented Sep 24, 2019

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 • edited Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 • edited Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 • edited Loading

Ratio2 commented Sep 29, 2019 • edited by Liryna Loading

Ratio2 commented Sep 29, 2019 • edited by Liryna Loading

Liryna commented Sep 29, 2019

Ratio2 commented Sep 29, 2019 • edited Loading

Ratio2 commented Sep 29, 2019 • edited Loading

Liryna commented Sep 30, 2019

jetwhiz commented Oct 1, 2019

Liryna commented Oct 1, 2019

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited by Liryna

Loading

Ratio2 commented Sep 29, 2019 •

edited by Liryna

Loading

Ratio2 commented Sep 29, 2019 •

edited

Loading

Ratio2 commented Sep 29, 2019 •

edited

Loading