Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: default to verbatim=true in dns.lookup() #37931

Closed
wants to merge 8 commits into from

Conversation

treysis
Copy link
Contributor

@treysis treysis commented Mar 26, 2021

Switch the default from false (reorder the result so that IPv4 addresses come before IPv6 addresses) to true (return them exactly as the resolver sent them to us.)

After many tests failed due to differently resolving localhost to ::1 or 127.0.0.1 and after it became impossible to keep track of what worked and what didn't in the CI jobs during running the test suite, this is a fresh start from what we learned in, and effectively a reissue of #37681.

Fixes: #31566
Refs: #6307
Refs: #20710
Reissue of #31567

@nodejs-github-bot nodejs-github-bot added dns Issues and PRs related to the dns subsystem. needs-ci PRs that need a full CI run. labels Mar 26, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: nodejs#31566
Refs: nodejs#6307
Refs: nodejs#20710
Reissue of nodejs#31567
Reissue of nodejs#37681
@aduh95 aduh95 added the semver-major PRs that contain breaking changes and should be released in the next major version. label Mar 26, 2021
@nodejs-github-bot
Copy link
Collaborator

@treysis
Copy link
Contributor Author

treysis commented Mar 27, 2021

So, I went again through any single test that failed on my machine and tried to fix it. I found the following tests to fail and did the following to fix them. Let's also have a CI job to see what problems exist on the other setups.

Explanations:

  • test/parallel/test-cluster-message.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http-localaddress.js: add family: 4, to connect because localhost could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http-upgrade-client.js: add family: 4, to http.get because http.get without specified host uses localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-http2-connect-options.js: add family: 4 to http2.connect options because localhost might resolve to ::1 if IPv6 is available, causing EINVAL when binding to 127.0.0.2, and server listen address is hardcoded legacy IPv4.

  • test/parallel/test-https-localaddress.js: add family: 4 to https.request options because localhost might resolve to ::1 if IPv6 is available, causing EINVAL when binding to 127.0.0.2, and server listen address is hardcoded legacy IPv4.

  • test/common/inspector-helper.js: add family: 4 to http.get options to make sure to connect via legacy IPv4 localhost, because inspector by default listens on legacy IPv4 localhost. Fixes:

    • test/parallel/test-inspect-async-hook-setup-at-inspect.js
    • test/parallel/test-inspector-esm.js
    • test/parallel/test-inspector-inspect-brk-node.js
    • test/parallel/test-inspector-multisession-ws.js
    • test/parallel/test-inspector-reported-host.js
    • test/parallel/test-inspector-wait-for-connection.js
    • test/parallel/test-inspector-waiting-for-disconnect.js
    • test/sequential/test-inspector.js
    • test/sequential/test-inspector-async-hook-setup-at-inspect-brk.js
    • test/sequential/test-inspector-async-hook-setup-at-signal.js
    • test/sequential/test-inspector-async-stack-traces-promise-then.js
    • test/sequential/test-inspector-async-stack-traces-set-interval.js
    • test/sequential/test-inspector-break-e.js
    • test/sequential/test-inspector-break-when-eval.js
    • test/sequential/test-inspector-console.js
    • test/sequential/test-inspector-debug-brk-flag.js
    • test/sequential/test-inspector-debug-end.js
    • test/sequential/test-inspector-exception.js
    • test/sequential/test-inspector-not-blocked-on-idle.js
    • test/sequential/test-inspector-scriptparsed-context.js
    • test/sequential/test-inspector-stop-profile-after-done.js
    • test/sequential/test-inspector-stress-http.js
  • test/parallel/test-net-dns-lookup.js: listen on common.localhostIPv4 and add family: 4 to connect to force legacy IPv4, because it is required by the assertion tests. Could be changed to DualStack compatibility by using assert.match(ip, /^(127\.0\.0\.1|::1)$/); and assert.match(type.toString, /^(4|6)$/);.

  • test/parallel/test-net-remote-address-port.js: add ::1 to remoteAddrCandidates. Why does this fix it?

  • test/parallel/test-tcp-wrap-listen.js: add ternary test with hasIPv6 and use bind or bind6 accordingly.

  • test/parallel/test-timers-socket-timeout-removes-other-socket-unref-timer.js: add family: 4 to connect because it will default to localhost being ::1 if IPv6 is available, and listen address is hardcoded to be legacy IPv4 address for localhost.

  • test/parallel/test-tls-client-getephemeralkeyinfo.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-tls-client-mindhsize.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/parallel/test-tls-wrap-econnreset-localaddress.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while server listen address is hardcoded legacy IPv4.

  • test/sequential/test-https-connect-localport.js: remove family: 4 from connect because listen uses localhost which maybe resolves to ::1? NEEDS TESTING! If it fails --> force legacy IPv4 with common.localhostIPv4.

  • test/sequential/test-inspector-open.js: add family: 4 to connect because without it will default to localhost which could resolve to ::1 if IPv6 is available, while inspector is listening on legacy IPv4 localhost by default.

  • test/sequential/test-net-connect-local-error.js: add family: 4 to optionsIPv4 and family: 6 to optionsIPv6, because it seems connecting on IPv4 will default to localhost which then resolves to ::1 if IPv6 is available and will cause an error.

Explanations:
-------------
test/parallel/test-cluster-message.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http-localaddress.js: add `family: 4,` to `connect` because `localhost` could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http-upgrade-client.js: add `family: 4,` to `http.get` because `http.get` without specified host uses `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-http2-connect-options.js: add `family: 4` to `http2.connect` `options` because `localhost` might resolve to `::1` if IPv6 is available, causing `EINVAL` when binding to `127.0.0.2`, and server listen address is hardcoded legacy IPv4.
test/parallel/test-https-localaddress.js: add `family: 4` to `https.request` `options` because `localhost` might resolve to `::1` if IPv6 is available, causing `EINVAL` when binding to `127.0.0.2`, and server listen address is hardcoded legacy IPv4.

test/common/inspector-helper.js: add `family: 4` to `http.get` options to make sure to connect via legacy IPv4 `localhost`, because inspector by default listens on legacy IPv4 `localhost`.
	Fixes:
		test/parallel/test-inspect-async-hook-setup-at-inspect.js
		test/parallel/test-inspector-esm.js
		test/parallel/test-inspector-inspect-brk-node.js
		test/parallel/test-inspector-multisession-ws.js
		test/parallel/test-inspector-reported-host.js
		test/parallel/test-inspector-wait-for-connection.js
		test/parallel/test-inspector-waiting-for-disconnect.js
		test/sequential/test-inspector.js
		test/sequential/test-inspector-async-hook-setup-at-inspect-brk.js
		test/sequential/test-inspector-async-hook-setup-at-signal.js
		test/sequential/test-inspector-async-stack-traces-promise-then.js
		test/sequential/test-inspector-async-stack-traces-set-interval.js
		test/sequential/test-inspector-break-e.js
		test/sequential/test-inspector-break-when-eval.js
		test/sequential/test-inspector-console.js
		test/sequential/test-inspector-debug-brk-flag.js
		test/sequential/test-inspector-debug-end.js
		test/sequential/test-inspector-exception.js
		test/sequential/test-inspector-not-blocked-on-idle.js
		test/sequential/test-inspector-scriptparsed-context.js
		test/sequential/test-inspector-stop-profile-after-done.js
		test/sequential/test-inspector-stress-http.js

test/parallel/test-net-dns-lookup.js: listen on `common.localhostIPv4` and add `family: 4` to `connect` to force legacy IPv4, because it is required by the assertion tests. Could be changed to DualStack compatibility by using `assert.match(ip, /^(127\.0\.0\.1|::1)$/);` and `assert.match(type.toString, /^(4|6)$/);`.
test/parallel/test-net-remote-address-port.js: add `::1` to `remoteAddrCandidates`. Why does this fix it?
test/parallel/test-tcp-wrap-listen.js: add ternary test with `hasIPv6` and use `bind` or `bind6` accordingly.
test/parallel/test-timers-socket-timeout-removes-other-socket-unref-timer.js: add `family: 4` to `connect` because it will default to localhost being `::1` if IPv6 is available, and listen address is hardcoded to be legacy IPv4 address for `localhost`.
test/parallel/test-tls-client-getephemeralkeyinfo.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-tls-client-mindhsize.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/parallel/test-tls-wrap-econnreset-localaddress.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while server listen address is hardcoded legacy IPv4.
test/sequential/test-https-connect-localport.js: remove `family: 4` from connect because listen uses `localhost` which maybe resolves to `::1`? NEEDS TESTING! If it fails --> force legacy IPv4 with `common.localhostIPv4`.
test/sequential/test-inspector-open.js: add `family: 4` to `connect` because without it will default to `localhost` which could resolve to `::1` if IPv6 is available, while inspector is listening on legacy IPv4 `localhost` by default.
test/sequential/test-net-connect-local-error.js: add `family: 4` to `optionsIPv4` and `family: 6` to `optionsIPv6`, because it seems connecting on IPv4 will default to `localhost` which then resolves to `::1` if IPv6 is available and will cause an error.
doc/api/dns.md Outdated Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@treysis
Copy link
Contributor Author

treysis commented Mar 27, 2021

Alright, so the preliminary results confirm something that I was suspecting. While we could replace 127.0.0.1 in probably all instances with localhost, this will break many tests on SmartOS. The reason is what we have found out and discussed in the previous PR: SmartOS can resolve localhost to ::1 when trying to open a listening socket. But it will refuse to resolve localhost to ::1 when opening a connection if there is no configured IPv6 address other than ::1, and instead resolves to 127.0.0.1.

Thus, I see two options:

  1. Apply a more general approach using localhost in favor of 127.0.0.1 and mark all those tests flaky for SmartOS, or
  2. Force IPv4 on all systems where SmartOS otherwise fails.

In my opinion, the way to go currently is approach 2. Approach 1 should be something implemented in a separate PR that tries to be more DualStack-aware. In approach 1 it would probably also make sense to review the implementation of DNS resolution on listening and connecting and to make sure they resolve to the same values.

@treysis
Copy link
Contributor Author

treysis commented Mar 27, 2021

SmartOS fixes:

  • test/parallel/test-net-connect-options-port.js: add family: 4 at various places to force localhost or default localhost to be resolved to legacy IPv4 addresses.
  • test/parallel/test-net-pingpong.js: change localhost to 127.0.0.1 because test is aimed at legacy IPv4, so it can be forced to literal IPv4 address without problem. NEEDS TESTING: does pingPongTest(0); pass?
  • test/parallel/test-net-remote-address-port.js: change localhost to 127.0.0.1 and force IPv4 by adding family: 4 because SmartOS will resolve localhost or no address automatically to legacy IPv4 address (127.0.0.1).
  • test/parallel/test-net-writable.js: listen on common.localhostIPv4 and add family: 4 to connect to force IPv4. because SmartOS will resolve localhost or no address automatically to legacy IPv4 address (127.0.0.1).
  • test/sequential/test-https-connect-localport.js: revert previous change, i.e. readd family: 4 to https.get. Change localhost on listen to common.localhostIPv4 to force legacy IPv4.

Explanations:
test/parallel/test-net-connect-options-port.js: add `family: 4` at various places to force `localhost` or default `localhost` to be resolved to legacy IPv4 addresses.
test/parallel/test-net-pingpong.js: change `localhost` to `127.0.0.1` because test is aimed at legacy IPv4, so it can be forced to literal IPv4 address without problem. NEEDS TESTING: does `pingPongTest(0);` pass?
test/parallel/test-net-remote-address-port.js: change `localhost` to `127.0.0.1` and force IPv4 by adding `family: 4` because SmartOS will resolve `localhost` or no address automatically to legacy IPv4 address (`127.0.0.1`).
test/parallel/test-net-writable.js: listen on `common.localhostIPv4` and add `family: 4` to connect to force IPv4. because SmartOS will resolve `localhost` or no address automatically to legacy IPv4 address (`127.0.0.1`).
test/sequential/test-https-connect-localport.js: revert previous change, i.e. readd `family: 4` to `https.get`. Change `localhost` on listen to `common.localhostIPv4` to force legacy IPv4.
@aduh95
Copy link
Contributor

aduh95 commented Mar 27, 2021

FWIW I don't think modifying the tests just so they can pass on SmartOS is OK. Depending on the cause of the failure:

  1. If this is an issue with our CI configuration, we should mark the tests as flaky on SmartOS and try to fix them later.
  2. If this is an issue with SmartOS itself, we should report the issue to them and hold on landing this until a fix is available.

Either way, I don't think modifying failing tests is the right thing to do here. @nodejs/platform-smartos wdyt?

The changes in bdedf3a LGTM though (thanks for the detailed explanation BTW, that's really good work).

@treysis
Copy link
Contributor Author

treysis commented Mar 27, 2021

  1. If this is an issue with our CI configuration, we should mark the tests as flaky on SmartOS and try to fix them later.
  2. If this is an issue with SmartOS itself, we should report the issue to them and hold on landing this until a fix is available.

I think it's both. Refusing to use ::1 when connecting to localhost while no other IPv6 address configured seems to be inherent to SmartOS. I don't know if this might even be a general thing on Unix>BSD>Solaris. I created a VM with SmartOS and ping -s localhost would use 127.0.0.1, while ping -s -A inet6 localhost would fail with unknown host despite having ::1 localhost in /etc/hosts, and ifconfig showing ::1 being assigned to lo. It would work as soon as there's another IPv6 address configured. I don't think anyone is really going to change this.

On the other hand, I don't understand why it is binding fine to ::1 when specifying localhost in this situation. In my opinion this shouldn't be happening, and I am not sure yet if this is related to NodeJS, or SmartOS.

The changes in bdedf3a LGTM though (thanks for the detailed explanation BTW, that's really good work).

Without the explanations I probably would have lost track myself. It was necessary to have an overview over what needs to be addressed. But thanks for the acknowledgement.

@treysis
Copy link
Contributor Author

treysis commented Mar 28, 2021

Do the Linux CIs have IPv6 connectivity? Even though I don't really want to admit it...if I remove the public IPv6 address on my system, I observe the same problem as on SmartOS: listening on localhost means binding to ::1, connecting to localhost means connecting to 127.0.0.1.

I am not sure if this is related to the DNS module or to the resolver of the OS. If I do ping localhost it will always do ping ::1 on my system (Linux Mint), no matter if public IPv6 configured or not.

Edit: Thinking a bit about it I think we should adjust the tests. There is the chance (in theory) that having an IPv6 address configured or not might change during runtime. Since we can't listen DualStack on localhost, it makes sense to force the few tests to IPv4. I actually also don't see any reason why those tests use localhost instead of the hardcoded 127.0.0.1 that most of the other tests use anyways.

Edit2: It's related to getaddrinfo() and is the expected behavior if loopback is the only IPv6 address on the system: https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

@treysis
Copy link
Contributor Author

treysis commented Mar 28, 2021

I think we should ditch AF_ADDRCONFIG. This was already suggested by @bnoordhuis in the previous discussions, and has been suggested in other projects not related to NodeJS as well.

On another node, @bnoordhuis also suggested ditching common.localhostIPv4 and replace it by hardcoded 127.0.0.1, as this method is just unnecessary.

A commit will follow shortly.

@treysis
Copy link
Contributor Author

treysis commented Mar 28, 2021

Can we get another CI job?

@nodejs-github-bot
Copy link
Collaborator

@treysis
Copy link
Contributor Author

treysis commented Mar 28, 2021

Great! So, what else needs to be discussed now in order to merge this?

What should be mentioned in the documentation: usage of localhost should be discouraged. Use 127.0.0.1 or ::1.

test/sequential/test-https-connect-localport.js Outdated Show resolved Hide resolved
@mhdawson
Copy link
Member

I'm in agreement with @mcollina suggestion - #37931 (comment) as a way to make landing the change feasible.

@treysis
Copy link
Contributor Author

treysis commented Apr 12, 2021

What I am asking is a way to change the default globally, i.e. require('dns').verbatim = true

Is this kind of syntax supported by NodeJS/Javascript?

@addaleax
Copy link
Member

What I am asking is a way to change the default globally, i.e. require('dns').verbatim = true

Is this kind of syntax supported by NodeJS/Javascript?

Yes.

That being said, this might be a case where an environment variable seems like a better choice (because it is configurable from outside the process, which is relevant because this affects users just as much as it does developers, and because that way it is inherited by Workers and child processes by default).

@treysis
Copy link
Contributor Author

treysis commented Apr 12, 2021

Yes.

Thx. Unfortunately, I haven't been able to find some example code how to do this. Still, I can't even find any documentation of this syntax.

That being said, this might be a case where an environment variable seems like a better choice (because it is configurable from outside the process, which is relevant because this affects users just as much as it does developers, and because that way it is inherited by Workers and child processes by default).

Yes. But maybe we can implement both solutions. The inside process option to make it easier for the devs (though I feel this might lead to some devs just setting it 'to be sure' and never touching it again), and the outside option with higher precedence to eventually let the user decide.

@treysis
Copy link
Contributor Author

treysis commented Apr 12, 2021

I am not very well versed in Javascript. I believe we have to do sth like this:

// lib/dns.js
...
let verbatim = true;
if (this.verbatimglobal == false) {
  verbatim = false;
  options.verbatim = false;
}
...

// any app that uses NodeJS' dns module:
var dns = require('dns');
dns.verbatimglobal = false;

But that throws an error/warning if dns.verbatimglobal is not set.

I couldn't find a way to make var dns = require('dns').verbatim = false work.

@aduh95
Copy link
Contributor

aduh95 commented Apr 13, 2021

I couldn't find a way to make require('dns').verbatim = false work.

Implementation is done in #38099, once it has landed it should unblock this PR.

@mhdawson
Copy link
Member

Was discussed in the TSC meeting today. From those who were there (only a small subset) consensus was that the opt-in approach would be ok. That can be landed after 16.x is released, likely too late for a SemVer major version as it's unlikely that we can reach consensus.

@mcollina should this still stay on the TSC agenda or is that the feedback you were looking for.

@mcollina mcollina removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Apr 15, 2021
@mcollina
Copy link
Member

removed!

@treysis
Copy link
Contributor Author

treysis commented Apr 16, 2021

So as I understand 'opt-in' means v16 will allow global configuration of the verbatim order, but for v16 it will still default to false?

@jasnell
Copy link
Member

jasnell commented Apr 16, 2021

Yes that is correct

@treysis
Copy link
Contributor Author

treysis commented Apr 16, 2021

I think it would be great to have a way for the user to configure the behavior via environment-variables until this PR lands.

@aduh95
Copy link
Contributor

aduh95 commented Jun 2, 2021

@treysis Would you be interested in resurrecting this PR now that #38099 has landed?

@mcollina
Copy link
Member

mcollina commented Jun 2, 2021

I'm +1 for this landing as the default in Node v17.

@treysis
Copy link
Contributor Author

treysis commented Jun 2, 2021

Of course. I have loosely been monitoring the other PR. But it might take me some days until I can have a look at it, understand what exactly the other PR did, and adjust to the changes.

@aduh95
Copy link
Contributor

aduh95 commented Sep 3, 2021

@treysis as a heads up, Node.js v17.x semver cutoff is getting close (around September 19th I think, the release date is scheduled for October 19th). If you were able to rebase your branch on top on master, we could have this change included in 17.0.0.

@treysis
Copy link
Contributor Author

treysis commented Sep 3, 2021

@aduh85 Thx. I was about to dive into it again already the past days but didn't find the time/motivation yet. The upcoming deadline might be just what is needed :)

@treysis
Copy link
Contributor Author

treysis commented Sep 3, 2021

Before I submit a new PR...where is the getDefaultVerbatim getting its value from? I guess I should leave it like that in the code and change whatever defines getDefaultVerbatim, right? I think it's in lib/internal/dns/utils.js line 189?

treysis pushed a commit to treysis/io.js that referenced this pull request Sep 3, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: nodejs#31566
Refs: nodejs#6307
Refs: nodejs#20710
Reissue of nodejs#31567
Reissue of nodejs#37681
Reissue of nodejs#37931
treysis pushed a commit to treysis/io.js that referenced this pull request Sep 3, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: nodejs#31566
Refs: nodejs#6307
Refs: nodejs#20710
Reissue of nodejs#31567
Reissue of nodejs#37681
Reissue of nodejs#37931
treysis pushed a commit to treysis/io.js that referenced this pull request Sep 3, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: nodejs#31566
Refs: nodejs#6307
Refs: nodejs#20710
Reissue of nodejs#31567
Reissue of nodejs#37681
Reissue of nodejs#37931
treysis pushed a commit to treysis/io.js that referenced this pull request Sep 3, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: nodejs#31566
Refs: nodejs#6307
Refs: nodejs#20710
Refs: nodejs#38099
Reissue of nodejs#31567
Reissue of nodejs#37681
Reissue of nodejs#37931
nodejs-github-bot pushed a commit that referenced this pull request Sep 12, 2021
Switch the default from false (reorder the result so that IPv4 addresses
come before IPv6 addresses) to true (return them exactly as the resolver
sent them to us.)

Fixes: #31566
Refs: #6307
Refs: #20710
Refs: #38099
Reissue of #31567
Reissue of #37681
Reissue of #37931

PR-URL: #39987
Refs: #6307
Refs: #20710
Refs: #38099
Reviewed-By: Antoine du Hamel <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
@aduh95
Copy link
Contributor

aduh95 commented Sep 12, 2021

Superseded by 1b2749e.

@aduh95 aduh95 closed this Sep 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dns Issues and PRs related to the dns subsystem. needs-ci PRs that need a full CI run. needs-citgm PRs that need a CITGM CI run. review wanted PRs that need reviews. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dns: default to verbatim=true in dns.lookup()