[jvm] Pattern matching involving unicode #10720

kevinresol · 2022-06-03T05:08:24Z

v4.2.5

function foo() return '😀 😀';

switch (foo()) {
	case '😀 😀':
		trace('yo');
	case v:
		trace('meh');
}

this prints "yo" in nodejs and "meh" in jvm

Simn · 2022-06-03T07:04:41Z

This might be about the hashing being used. IIRC we determine a hash at compile-time and compare it to the one from run-time.

kevinresol · 2022-06-04T12:11:40Z

Java treats each emoji to have length 2. So I guess that's where the issue comes from:

function foo() return '😀 😀';
function bar() return 'abc';
function baz() return '名 字';

function main() {
	trace(hashCode(foo()), ((cast foo():java.lang.Object).hashCode()), foo().length);
	trace(hashCode(bar()), ((cast bar():java.lang.Object).hashCode()), bar().length);
	trace(hashCode(baz()), ((cast baz():java.lang.Object).hashCode()), baz().length);
}

function hashCode(value:String) {
	var h = 0;
	
	if(value.length > 0) {
		for(i in 0...value.length) {
			h = 31 * h + value.charCodeAt(i);
		}
	}
	
	return h;
}

prints:

src/Main.hx:8: 1278630208, 1278630208, 5
src/Main.hx:9: 96354, 96354, 3
src/Main.hx:10: 20702212, 20702212, 3

Aurel300 · 2022-06-04T17:01:44Z

Java uses a modified UTF-8 encoding, described here. In particular, code points above U+FFFF are encoded using two surrogate code units instead of the standard 4-byte UTF-8 sequence representing a single code unit.

Simn · 2022-06-07T07:19:17Z

Fortunately, I found a java_hash implementation in genjava which deals with this stuff.

Simn self-assigned this Jun 3, 2022

Simn added the platform-jvm Everything related to JVM label Jun 3, 2022

Simn closed this as completed in e239cf0 Jun 7, 2022

skial mentioned this issue Jun 8, 2022

Haxe Roundup 631 skial/haxe.io#985

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm] Pattern matching involving unicode #10720

[jvm] Pattern matching involving unicode #10720

kevinresol commented Jun 3, 2022

Simn commented Jun 3, 2022

kevinresol commented Jun 4, 2022

Aurel300 commented Jun 4, 2022 •

edited

Loading

Simn commented Jun 7, 2022

[jvm] Pattern matching involving unicode #10720

[jvm] Pattern matching involving unicode #10720

Comments

kevinresol commented Jun 3, 2022

Simn commented Jun 3, 2022

kevinresol commented Jun 4, 2022

Aurel300 commented Jun 4, 2022 • edited Loading

Simn commented Jun 7, 2022

Aurel300 commented Jun 4, 2022 •

edited

Loading