Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Net.js) Default happy eyeballs algorithm cause my request fail with a server in other side of the plannet #52216

Closed
SSANSH opened this issue Mar 26, 2024 · 6 comments · Fixed by #52474 or #52492
Labels
net Issues and PRs related to the net subsystem.

Comments

@SSANSH
Copy link

SSANSH commented Mar 26, 2024

Version

v20.11.1

Platform

Linux testserver 3.10.0-1160.114.2.el7.x86_64 #1 SMP Sun Mar 3 08:18:39 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

Net

What steps will reproduce the bug?

`

const { Socket } = require('net');
const sock = new Socket();
sock.connect({
host: 'latencyontheothersideoftheplanet.com',
port: 4222,
});
sock.on('connect', () => {
console.log('Connected to server');
// Send data to the server
sock.write('Hello, server!');
});
sock.on('data', (data) => {
console.log(Received data from server: ${data});

// Close the socket
sock.end();
});
sock.on('end', () => {
console.log('Disconnected from server');
});

`
I try to connect from a japan server to a european server.
Node 20.0.0, the default value of autoSelectFamily in socket.connect() changed from false to true.
in this case the latency is too high and "let autoSelectFamilyAttemptTimeoutDefault = 250;" define into Net.js is too low, so the query fail every time with a timeout for ipv4 (rais the timeout on algorithm) and with ENETUNREACH for ipv6 (this is normal as I dont have ipv6 network).
For fixing the issue (workaround) I set :

`

export NODE_OPTIONS=--no-network-family-autoselection

`

This parameter disable "happy eyeballs" algorithm add into ndoejs 20.
I know the timeout can be enlarge by "setDefaultAutoSelectFamilyAttemptTimeout" but this require an update of code.
I think its should be great to define this parameter as env variable for limiting breaking change not documented for this.
One more point, I think if there are no ipv6 network available no need to apply timeout.

How often does it reproduce? Is there a required condition?

you need to have a server with high latency connection greater than 250 ms

What is the expected behavior? Why is that the expected behavior?

I want my code works as on nodejs18 without timeout on connect with a server which has bad connection

What do you see instead?

I see a timeout on ivp4,

Additional information

No response

@SSANSH SSANSH changed the title Default happy eyeballs algorithm cause my request fail Default happy eyeballs algorithm cause my request fail with a server in other side of the plannet Mar 26, 2024
@SSANSH SSANSH changed the title Default happy eyeballs algorithm cause my request fail with a server in other side of the plannet (Net.js) Default happy eyeballs algorithm cause my request fail with a server in other side of the plannet Mar 26, 2024
@tniessen tniessen added the net Issues and PRs related to the net subsystem. label Mar 26, 2024
@nwalters512
Copy link

nwalters512 commented Apr 2, 2024

EDIT: for continuity, some of this conversation is being continued on https://github.com/orgs/nodejs/discussions/48028#discussioncomment-7926376.

I ran into this too on Node v20.12.0. For me, this occurs when node-gyp attempts to download https://nodejs.org/download/release/v20.12.0/node-v20.11.0-headers.tar.gz. This is occurring with a server located in China, so it's not surprising that there's >250ms network latency as packets traverse the Great Firewall. It's reproducible in isolation in a REPL:

$ node
Welcome to Node.js v20.12.0.
Type ".help" for more information.
> fetch('https://nodejs.org/download/release/v20.12.0/node-v20.11.0-headers.tar.gz').then(console.log).catch(err => console.dir(err, { depth: null }));
Promise {
  <pending>,
  [Symbol(async_id_symbol)]: 53,
  [Symbol(trigger_async_id_symbol)]: 52
}
> TypeError: fetch failed
    at node:internal/deps/undici/undici:12345:11
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: AggregateError [ETIMEDOUT]: 
      at internalConnectMultiple (node:net:1116:18)
      at internalConnectMultiple (node:net:1184:5)
      at Timeout.internalConnectMultipleTimeout (node:net:1710:5)
      at listOnTimeout (node:internal/timers:575:11)
      at process.processTimers (node:internal/timers:514:7) {
    code: 'ETIMEDOUT',
    [errors]: [
      Error: connect ETIMEDOUT 104.20.22.46:443
          at createConnectionError (node:net:1646:14)
          at Timeout.internalConnectMultipleTimeout (node:net:1705:38)
          at listOnTimeout (node:internal/timers:575:11)
          at process.processTimers (node:internal/timers:514:7) {
        errno: -110,
        code: 'ETIMEDOUT',
        syscall: 'connect',
        address: '104.20.22.46',
        port: 443
      },
      Error: connect ENETUNREACH 2606:4700:10::6814:162e:443 - Local (:::0)
          at internalConnectMultiple (node:net:1180:16)
          at Timeout.internalConnectMultipleTimeout (node:net:1710:5)
          at listOnTimeout (node:internal/timers:575:11)
          at process.processTimers (node:internal/timers:514:7) {
        errno: -101,
        code: 'ENETUNREACH',
        syscall: 'connect',
        address: '2606:4700:10::6814:162e',
        port: 443
      },
      Error: connect ETIMEDOUT 104.20.23.46:443
          at createConnectionError (node:net:1646:14)
          at Timeout.internalConnectMultipleTimeout (node:net:1705:38)
          at listOnTimeout (node:internal/timers:575:11)
          at process.processTimers (node:internal/timers:514:7) {
        errno: -110,
        code: 'ETIMEDOUT',
        syscall: 'connect',
        address: '104.20.23.46',
        port: 443
      },
      Error: connect ENETUNREACH 2606:4700:10::6814:172e:443 - Local (:::0)
          at internalConnectMultiple (node:net:1180:16)
          at Timeout.internalConnectMultipleTimeout (node:net:1710:5)
          at listOnTimeout (node:internal/timers:575:11)
          at process.processTimers (node:internal/timers:514:7) {
        errno: -101,
        code: 'ENETUNREACH',
        syscall: 'connect',
        address: '2606:4700:10::6814:172e',
        port: 443
      }
    ]
  }
}

Like OP, I was able to resolve this by setting NODE_OPTIONS=--no-network-family-autoselection. Because this occurs in node-gyp (and the actual network request happens many dependencies down), there's no way for me to add autoSelectFamilyAttemptTimeout to a socket.connect call.

I'm able to download the file just fine with both Curl and Python, so I know this isn't a total network failure or reachability problem. The above error only happens in Node.

Anecdotally, it only started occurred after upgrading to Node v20.12.0, though I'm not sure if that's just a coincidence. Is it possible that the nodejs.org was recently updated to support IPv6, and that's why we're only seeing this now? I don't know exactly when family autoselection occurs. It's also possible that the Great Firewall adds more delay some days than others.

@tniessen
Copy link
Member

tniessen commented Apr 6, 2024

After discussing this with @nwalters512 here, I am pretty sure this is due to #48145.

@ShogunPanda
Copy link
Contributor

ShogunPanda commented Apr 10, 2024

@nwalters512 Can you please provide the result of node -e "require('dns').lookup('nodejs.org', {all: true}, console.log)"?

From what I see in the aggregate error, this might not be a bug but an unfortunate DNS problems.
If you look at the aggregate error above, you see that the system tries the addresses in this order: IPv4, IPv6, IPv4, IPv6.
All this is normal.
To ensure a situation similar to before Happy Eyeballs I enforced the autoSelectFamilyAttemptTimeout to all the attempted addresses but the last one.
But your last attempt was not a IPv4 (which would have succeded) but a IPv6.

@tniessen To prove this, I've created a quick test (note that if your system is able to connect before 10ms this test might fail):

'use strict';

const common = require('../common');
const { createMockedLookup } = require('../common/dns');

const assert = require('assert');
const { createConnection, createServer } = require('net');

const ipv4Addresses = ['104.20.22.46', '104.20.23.46'];
const ipv6Addresses = ['2606:4700:10::6814:162e:443', '2606:4700:10::6814:172e:443'];

// This test will have a IPv6 as last attempted address. Having no connectivity on IPv6 the attempt is expected to fail.
{
  const connection = createConnection({
    host: 'nodejs.org',
    port: 443,

    lookup: createMockedLookup(...ipv4Addresses, ...ipv6Addresses),
    autoSelectFamily: true,
    autoSelectFamilyAttemptTimeout: 10,
  });

  connection.on('ready', common.mustNotCall());
  connection.on('error', common.mustCall((error) => {
    assert.strictEqual(error.errors[3].address, ipv6Addresses[1])
  }));
}

// This test will have a IPv4 as last attempted address. That attempt will have no timeout enforced and therefore the attempt is expected to succeed.
{
  const connection = createConnection({
    host: 'nodejs.org',
    port: 443,

    lookup: createMockedLookup(...ipv6Addresses, ...ipv4Addresses),
    autoSelectFamily: true,
    autoSelectFamilyAttemptTimeout: 10,
  });

  connection.on('ready', common.mustCall(() => {
    connection.end();
  }));
  connection.on('error', common.mustNotCall());
}

In order to solve this configuration problem we have two ways, both unfortunately still to be implemented:

  1. Have a CLI (and thus NODE_OPTIONS) flag for autoSelectFamilyAttemptTimeoutDefault.
  2. Support --dns-result-order=ipv6first as well.

So, this is not directly related to #48145 but surely needs action.
I will create a PR soon.

@tniessen
Copy link
Member

But your last attempt was not a IPv4 (which would have succeded) but a IPv6.

@ShogunPanda That was my hypothesis in the discussion linked above as well. I didn't say it's a bug — just known undesirable behavior of the existing implementation.

@nwalters512
Copy link

@ShogunPanda here's that command and its output run from the machine in question:

$ node -e "require('dns').lookup('nodejs.org', {all: true}, console.log)"
null [
  { address: '104.20.22.46', family: 4 },
  { address: '104.20.23.46', family: 4 },
  { address: '2606:4700:10::6814:162e', family: 6 },
  { address: '2606:4700:10::6814:172e', family: 6 }
]

@ShogunPanda
Copy link
Contributor

That confirms my opinion.
I'll work on the mitigating PRs this morning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Issues and PRs related to the net subsystem.
Projects
None yet
4 participants