Consider removing napi_get_value_string_length #226

jasongin · 2017-04-12T20:44:04Z

I regret adding this API now, because it doesn't serve much purpose, and is only likely to cause confusion and bugs. The intention was that this would return the number of "characters" in a string independent of encoding, but that's not generally useful. In almost all cases, one of the encoding-specific napi_get_value_string_* APIs is more correct. (Pass a null buffer if only the encoded length is desired.)

Anyway the current V8 implementation of napi_get_value_string_length() is technically wrong: it returns the number of 2-byte code points of the UTF-16 encoding, but there are actually characters that are encoded as two UTF-16 code points.

The text was updated successfully, but these errors were encountered:

kkoopa · 2017-04-12T21:03:08Z

You are probably right. It is better to err on the side of caution. Character encoding should be added to the list of 2 hard problems in CS, which would now be the 3 hard problems:

Cache invalidation
Naming
Character encoding
Off-by-one errors

jasongin · 2017-04-12T21:38:39Z

And in case anyone is wondering, the JavaScript String.prototype.length property returns the number of UTF-16 code units, which may be different from the number of characters. So, getting the true character count is not common with JavaScript, and is probably best left to specialized internationalization libraries.

trevnorris · 2017-04-14T02:14:11Z

@jasongin Personally I think the byte length would be the most useful b/c that's what's needed if you want to copy out the string. For that you could just copy the method used by Buffer.byteLength(), and in the same way could extend the API to accept an encoding.

jasongin · 2017-04-14T17:38:07Z

@trevnorris There are other APIs to get byte length, that is why this API should just be removed.

napi_get_value_string_utf8() gets the length of the UTF-8 encoding of a string, in bytes.
napi_get_value_string_utf16() gets the length of the UTF-16 encoding of a string, in 2-byte code units.
In both of those, if you just want the length without copying the string, you can pass in a null string buffer.

mhdawson · 2017-04-18T19:52:18Z

I'm +1 for removing. Its safer to start with a smaller API and add back than to be stuck with methods that cause confusion and are not really needed.

This API doesn't serve much purpose, and is only likely to cause confusion and bugs. The intention was that this would return the number of characters in a string independent of encoding, but that's not generally useful. In almost all cases, one of the encoding-specific napi_get_value_string_* APIs is more correct. (Pass a null buffer if only the encoded length is desired.) Anyway the current implementation of napi_get_value_string_length() is technically wrong: it returns the number of 2-byte code units of the UTF-16 encoding, but there are actually some characters that are encoded as two UTF-16 code units. Note the JavaScript String.prototype.length property returns the number of UTF-16 code units, which may be different from the number of characters. So, getting the true character count is not common with JavaScript, and is probably best left to specialized internationalization libraries. Fixes: nodejs/abi-stable-node#226

This API doesn't serve much purpose, and is only likely to cause confusion and bugs. The intention was that this would return the number of characters in a string independent of encoding, but that's not generally useful. In almost all cases, one of the encoding-specific napi_get_value_string_* APIs is more correct. (Pass a null buffer if only the encoded length is desired.) Anyway the current implementation of napi_get_value_string_length() is technically wrong: it returns the number of 2-byte code units of the UTF-16 encoding, but there are actually some characters that are encoded as two UTF-16 code units. Note the JavaScript String.prototype.length property returns the number of UTF-16 code units, which may be different from the number of characters. So, getting the true character count is not common with JavaScript, and is probably best left to specialized internationalization libraries. PR-URL: nodejs#12496 Fixes: nodejs/abi-stable-node#226 Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: James M Snell <[email protected]>

This API doesn't serve much purpose, and is only likely to cause confusion and bugs. The intention was that this would return the number of characters in a string independent of encoding, but that's not generally useful. In almost all cases, one of the encoding-specific napi_get_value_string_* APIs is more correct. (Pass a null buffer if only the encoded length is desired.) Anyway the current implementation of napi_get_value_string_length() is technically wrong: it returns the number of 2-byte code units of the UTF-16 encoding, but there are actually some characters that are encoded as two UTF-16 code units. Note the JavaScript String.prototype.length property returns the number of UTF-16 code units, which may be different from the number of characters. So, getting the true character count is not common with JavaScript, and is probably best left to specialized internationalization libraries. Backport-PR-URL: #19447 PR-URL: #12496 Fixes: nodejs/abi-stable-node#226 Reviewed-By: Michael Dawson <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: James M Snell <[email protected]>

jasongin mentioned this issue Apr 14, 2017

String::Length missing nodejs/node-addon-api#14

Closed

jasongin mentioned this issue Apr 18, 2017

n-api: remove napi_get_value_string_length() nodejs/node#12496

Closed

3 tasks

mhdawson closed this as completed in nodejs/node@468275a Apr 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider removing napi_get_value_string_length #226

Consider removing napi_get_value_string_length #226

jasongin commented Apr 12, 2017

kkoopa commented Apr 12, 2017

jasongin commented Apr 12, 2017

trevnorris commented Apr 14, 2017

jasongin commented Apr 14, 2017

mhdawson commented Apr 18, 2017

Consider removing napi_get_value_string_length #226

Consider removing napi_get_value_string_length #226

Comments

jasongin commented Apr 12, 2017

kkoopa commented Apr 12, 2017

jasongin commented Apr 12, 2017

trevnorris commented Apr 14, 2017

jasongin commented Apr 14, 2017

mhdawson commented Apr 18, 2017