Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: make ICU-dependent functions warn for node compiled --without-intl #35099

Closed

Conversation

aduh95
Copy link
Contributor

@aduh95 aduh95 commented Sep 8, 2020

ICU dependent functions throw an ERR_NO_ICU error on Node.js versions
compiled without ICU.

FYI the current behaviour is to return the input string unchanged when ICU is not available:

$ out/Release/node -p process.versions | grep icu # compiled --without-intl
$ out/Release/node -p "require('url').domainToUnicode('xn--wgv71a119e.com')"
xn--wgv71a119e.com
$ out/Release/node -p "require('url').domainToASCII('www.日本語.com')"
www.日本語.com

That's – to say the least – unexpected taht the function silently fails. This PR addresses that:

$ out/Release/node -p "require('url').domainToUnicode('xn--wgv71a119e.com')"
Warning: Cannot convert to ASCII when intl is disabled!
Warning: Cannot convert to unicode when intl is disabled!
xn--wgv71a119e.com
$ out/Release/node -p "require('url').domainToASCII('www.日本語.com')"
Warning: Cannot convert to ASCII when intl is disabled!
www.日本語.com
$
$ # This also affect WHATWG `URL`:
$
$ out/Release/node -e "new URL('http://www.日本語.com')"
Warning: Cannot convert to ASCII when intl is disabled!
$ out/Release/node -e "new URL('http://github.com')"    
Warning: Cannot convert to ASCII when intl is disabled!

This is a semver-major change.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Sep 8, 2020
@aduh95 aduh95 force-pushed the url-icu-dependent-functions-throw branch 2 times, most recently from 3659a73 to 6b8dbe1 Compare September 8, 2020 09:00
@targos targos added the semver-major PRs that contain breaking changes and should be released in the next major version. label Sep 8, 2020
@addaleax addaleax added i18n-api Issues and PRs related to the i18n implementation. url Issues and PRs related to the legacy built-in url module. labels Sep 9, 2020
Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure if this is worth the potential breakage, and instead it might be better to call this behavior out in the documentation?

Most people who write code for the ecosystem are not going to expect an exception here, because the default build of Node.js doesn’t fail this way, so passing the data through unchanged might be the better choice, even if it’s not ideal.

doc/api/url.md Outdated Show resolved Hide resolved
doc/api/url.md Outdated Show resolved Hide resolved
@aduh95
Copy link
Contributor Author

aduh95 commented Sep 9, 2020

I’m not sure if this is worth the potential breakage

Should it emit a warning instead of throwing an error? IMHO that would be better than failing silently.

@aduh95
Copy link
Contributor Author

aduh95 commented Sep 9, 2020

For reference, TextDecoder already throws a ERR_NO_ICU when initiated with { fatal: true }. Another way of handling this would be to add a similar option to domainToUnicode and domainToASCII for users to opt-in on that behaviour.

@addaleax
Copy link
Member

addaleax commented Sep 9, 2020

@aduh95 I think both of those options make sense, yes :)

@srl295
Copy link
Member

srl295 commented Sep 19, 2020

+1 on the opt-in, I don't think it would be good to add an unexpected exception.

As to the name, ICU is technically correct, however, from the builder's point of view the term has been "intl" such as from configure docs:

    --with-intl=WITH_INTL
                        Intl mode (valid choices: none, small-icu, full-icu,
                        system-icu) [default: small-icu]
    --without-intl      Disable Intl, same as --with-intl=none (disables
                        inspector)

So, Node.js was built using the options --with-intl=none or --without-intl in order to get you into the situation where ICU is not available. And, the functions mentioned here are in fact intl related (though not part of Ecma Intl).

So, perhaps the option should be called ERR_WITHOUT_INTL or ERR_NO_INTL?

However, I see there's already a:

E('ERR_NO_ICU',
  '%s is not supported on Node.js compiled without ICU', TypeError);

So this (potential) inconsistency is already there.

@aduh95 aduh95 force-pushed the url-icu-dependent-functions-throw branch from 5f3fbc8 to 189677f Compare September 26, 2020 12:00
@aduh95 aduh95 changed the title url: make ICU-dependent functions throw url: make ICU-dependent functions warn for node compiled --without-intl Sep 26, 2020
@aduh95
Copy link
Contributor Author

aduh95 commented Sep 26, 2020

I have removed the exception throwing, instead, a message is printed to stderr notifying the user that the operation is a no-op. I'm not 100% satisfied with this solution as it doesn't create a proper warning that can be inhibited with a flag, but given the fact that the function doesn't have access an Environment instance, I wasn't able to figure it out.

I'd argue that this is better than the current behaviour (no warning at all). What do y'all think?

@Trott
Copy link
Member

Trott commented Oct 1, 2020

@addaleax @srl295 @TimothyGu @joyeecheung @jasnell Thoughts on this approach?

@Trott
Copy link
Member

Trott commented Oct 1, 2020

Also, is this still semver-major since it doesn't throw anymore but merely prints a warning? I'm inclined to say "technically, yes, so let's be cautious and treat it as a breaking change" but I also wonder if that's a bit much....

@addaleax
Copy link
Member

addaleax commented Oct 1, 2020

It would be nice if these warnings were emitted either via process.emitWarning() in JS (it’s okay to create wrapper functions for --without-intl builds imo) or ProcessEmitWarning() in C++ instead, because that way this works with --no-warning, --trace-warning, process.on('warning'), etc.

Also, yes, I think we should treat this as semver-major. Apart from that, I have no strong opinions here (except for those that I already voiced and that have been addressed :)).

@@ -787,11 +787,13 @@ bool ToASCII(const std::string& input, std::string* output) {
#else
// Intentional non-ops if ICU is not present.
bool ToUnicode(const std::string& input, std::string* output) {
std::cerr << "Warning: Cannot convert to unicode when intl is disabled!\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These definitely would need to use the emitWarning() mechanism. Also, I'd prefer if these were opt-in and gated by a command line flag.

Copy link
Member

@srl295 srl295 Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if these warnings were emitted either via process.emitWarning() in JS (it’s okay to create wrapper functions for --without-intl builds imo) or ProcessEmitWarning() in C++ instead, because that way this works with --no-warning, --trace-warning, process.on('warning'), etc.

Im not familiar with those functions, but it does sound better than just printf(stderr
Does emitwarning elide duplicate warnings? i.e. only print a warning once?

Also, yes, I think we should treat this as semver-major. Apart from that, I have no strong opinions here (except for those that I already voiced and that have been addressed :)).

I think it is major because it's long standing behavior.

( @aduh95 out of curiosity, just to keep myself up to date, what is the major use case that you see for without-intl — smaller size, and/or embedded devices, ?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emitWarning() API does a number of things... (a) it ensures that all warnings are emitted in a consistent format, (b) it allows warnings to be supressed using command-line flag or environment variables, (b) it forwards the warnings to the process.on('warning') event which is emitted even if the warning output to the console has been disabled (allowing warnings to be logged using third party tools). It does not currently limit warnings to be printed only once unless those are DeprecationWarning or ExperimentalWarning but that is something we can look at later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the major use case that you see for without-intl?

@srl295 As a matter of fact, I don't use it :) I stumbled upon this weird behavior when working on #35091, so I figure that was probably not intentional.

@jasnell do you know if emitWarning API is available from C++-land and how to access it? I agree that is the "correct" way to do it, but I didn't manage to get it working.

Another approach I have thought of would be to call the punycode JS module, but again I don't know if that's possible from C++-land. That would remove the need for a warning (or a throw).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, do a search in src for ProcessEmitWarning and you'll see how to make that happen

@jasnell
Copy link
Member

jasnell commented May 13, 2021

@aduh95 ... did you still want to pursue this?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label May 13, 2021
@github-actions
Copy link
Contributor

This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open.

@aduh95 aduh95 closed this May 15, 2021
@aduh95 aduh95 deleted the url-icu-dependent-functions-throw branch May 15, 2021 23:02
RaisinTen added a commit to RaisinTen/node that referenced this pull request May 23, 2021
Continuation of: nodejs#35099

Signed-off-by: Darshan Sen <[email protected]>
RaisinTen added a commit to RaisinTen/node that referenced this pull request May 25, 2021
aduh95 pushed a commit to RaisinTen/node that referenced this pull request May 26, 2021
Continuation of: nodejs#35099

Signed-off-by: Darshan Sen <[email protected]>

PR-URL: nodejs#38789
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
danielleadams pushed a commit that referenced this pull request May 31, 2021
Continuation of: #35099

Signed-off-by: Darshan Sen <[email protected]>

PR-URL: #38789
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
richardlau pushed a commit that referenced this pull request Jul 22, 2021
Continuation of: #35099

Signed-off-by: Darshan Sen <[email protected]>

PR-URL: #38789
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
richardlau pushed a commit that referenced this pull request Jul 22, 2021
Continuation of: #35099

Signed-off-by: Darshan Sen <[email protected]>

PR-URL: #38789
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
foxxyz pushed a commit to foxxyz/node that referenced this pull request Oct 18, 2021
Continuation of: nodejs#35099

Signed-off-by: Darshan Sen <[email protected]>

PR-URL: nodejs#38789
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Antoine du Hamel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. semver-major PRs that contain breaking changes and should be released in the next major version. stalled Issues and PRs that are stalled. url Issues and PRs related to the legacy built-in url module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants