-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditional unhandled 'error' event when http.request with .lookup
#48771
Comments
I'm not able to reproduce. Let's try to reduce the test case. What happens when you run this? const lookup = (_0, _1, cb) => cb(null, [{address:"192.168.144.2", family:4}])
const s = require("net").connect({host:"example.com", port:80, lookup})
s.on("error", console.log) // reached? |
// @ts-nocheck
const lookup = (_0, _1, cb) => cb(null, [{ address: "192.168.144.2", family: 4 }])
const s = require("net").connect({ host: "example.com", port: 80, lookup })
s.on("error", (err) => {
console.log({ err })
}) $ sudo ip route del blackhole 192.168.144.2
$ curl -s 192.168.144.2 >/dev/null && echo OK
OK
$ node ./src/extra/reproduce.cjs && echo OK
^C
$ sudo ip route add blackhole 192.168.144.2
$ node ./src/extra/reproduce.cjs && echo OK
{
err: Error: connect EINVAL 192.168.144.2:80 - Local (0.0.0.0:0)
at internalConnect (node:net:1087:16)
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at emitLookup (node:net:1478:9)
at lookup (/workspaces/loynoir/repo/reproduce-node-48771/src/extra/reproduce.cjs:2:32)
at emitLookup (node:net:1402:5)
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at lookupAndConnectMultiple (node:net:1401:3)
at node:net:1347:7
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at lookupAndConnect (node:net:1346:5) {
errno: -22,
code: 'EINVAL',
syscall: 'connect',
address: '192.168.144.2',
port: 80
}
}
OK |
Okay, so that works for you as well. Please try expanding the test until you have a minimal reproducer. |
https://github.com/loynoir/reproduce-node-48771 I think this is the minimal reproduce. I guess, some node code is not wrap with So, if user not use nextTick(() => {
...
callback(... In some situation, there is unhandled 'error' event within http.request, and leads to
|
FYI I tested if my network family autoselection was involved into this. Apparently is not: if you change lines 19 and 23 in `` in your repro repo to this:
to accomodate for both single and multiple DNS lookup, then the problem will happen even if you use In macOS in order to reproduce the problem you can route that IP to something unreachable. For instance:
(10.3.0.1 is unreachable from my system, you might have to use another IP). With the unreachable IP setup, I tried the narrow down example in #48771 (comment) and it worked, so it seems like the problem is not on the @mcollina Any thoughts on this? |
It looks like a bug, unfortunately, I don't have time to dig deep on how to fix it. I've never seen this problem happen in practice, but I guess it can happen if the kernel is really fast in responding. A quick code review spotted the problem: when a socket is assigned to a ClientRequest, we defer to the next tick setting an error handler: Lines 860 to 900 in a2fc4a3
The Line 822 in a2fc4a3
Deferring by a Lines 1414 to 1418 in a2fc4a3
Can that happen? As a side note, I'd recommend you to use |
When I use Although I cannot reproduce same error, I found |
@mcollina I think the only way to fix this is to defer the error emitting ( |
Verified Error is caught within $ sudo ip route add blackhole 192.168.144.4
$ cat reproduce.mjs
import { Agent } from 'undici'
try {
await fetch('http://example.com', {
dispatcher: new Agent({
connect: {
lookup: (hostname, options, callback) => {
// node <20
// callback(null, '192.168.144.4', 4)
// node 20
callback(null, [{ address: '192.168.144.4', family: 4 }])
}
}
})
})
} catch (caught) {
console.warn({ caught })
}
$ node reproduce.mjs
{
caught: TypeError: fetch failed
at Object.fetch (node:internal/deps/undici/undici:11576:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async file:///tmp/tmp.ogEc9I0rez/reproduce.mjs:4:5 {
cause: Error: connect EINVAL 192.168.144.4:80 - Local (0.0.0.0:0)
at internalConnect (node:net:1087:16)
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at emitLookup (node:net:1478:9)
at lookup (file:///tmp/tmp.ogEc9I0rez/reproduce.mjs:11:21)
at emitLookup (node:net:1402:5)
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at lookupAndConnectMultiple (node:net:1401:3)
at node:net:1347:7
at defaultTriggerAsyncIdScope (node:internal/async_hooks:464:18)
at lookupAndConnect (node:net:1346:5) {
errno: -22,
code: 'EINVAL',
syscall: 'connect',
address: '192.168.144.4',
port: 80
}
}
} |
@ShogunPanda yes |
@mcollina hello To fix the issue, I made a modification to the onSocket function. I checked for the presence of an error (err) and if it exists, I immediately emitted the 'error' event using this.emit('error', err) and then called this.destroy() to terminate the request. This way, the error is handled synchronously. Modified Code: ClientRequest.prototype.onSocket = function onSocket(socket, err) {
if (err) {
this.emit('error', err);
this.destroy();
return;
}
process.nextTick(onSocketNT, this, socket);
}; Open Questions: Is the proposed modification a correct and appropriate solution to this issue? Thanks! |
@mertcanaltin Thanks for your proposal. |
I wonder if I should open a pull request with this update @ShogunPanda |
I think so. Otherwise I already plan to include this issue in an future OSS onboarding event. |
is this issue still open i would like to take a look at it |
… tick self.destroy calls in the internalConnect adn internalConnectMultiple functions have a narrow case where they can throw before an error handler has been established. This change defers them to the next tick to allow time for the error handler to be set. Fixes: nodejs#48771
I've opened a PR for this issue here: #51038. It adds the deferral on emitting in internalConnect and internalConnectMultiple described in @ShogunPanda 's comment above. |
Wrote tests for the suggested fix in nodejs#48771 (comment) Fixes: nodejs#48771
Wrote tests for the suggested fix in nodejs#48771 (comment) Fixes: nodejs#48771
I would like to tackle this issue and is it open |
@DheerenGaud I apologize, I completely forgot I had an open PR to work on for this issue. I don't think I have the time to go back and figure out those tests so please feel free to pick this up (assuming it's still a valid issue). |
Version
20.4.0
Platform
Docker ArchLinux 6.1.35-1-lts
Subsystem
No response
What steps will reproduce the bug?
https://github.com/loynoir/reproduce-node-48771
How often does it reproduce? Is there a required condition?
What is the expected behavior? Why is that the expected behavior?
Make it possible to catch error, and exit with 0.
What do you see instead?
Error is not caught, and exit with 1.
Additional information
No response
The text was updated successfully, but these errors were encountered: