Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect to docker Provider over ssh (works outside of Pulumi) #1293

Open
cowwoc opened this issue Nov 29, 2024 · 2 comments
Open
Labels
awaiting-feedback Blocked on input from the author kind/bug Some behavior is incorrect or out of spec

Comments

@cowwoc
Copy link

cowwoc commented Nov 29, 2024

Describe what happened

I've been pulling my hair out trying to figure out why Pulumi is unable to connect to a remote docker engine using ssh (the connection gets refused) but when I take the exact same command-line and run it from cmd.exe it works just fine.

Sample program

const provider = new DockerProvider(`${name}.provider`, {
		host: interpolate`ssh://${SSH_USERNAME}@${node.ipv4Address}:22`,
		sshOpts: [
			"-i", pulumiPrivateKeyFile,
			"-o", "StrictHostKeyChecking=no",
			"-o", `UserKnownHostsFile=${DEV_NULL}`
		]
	},
	{
		parent: node,
		dependsOn: node
	});
[...]
const network = getNetwork({name: networkName}, {provider});

Log output

Pulumi output:

debug: Invoking function: tok=docker:index/getNetwork:getNetwork asynchronously
debug: , obj={"name":"network-internal"}
debug: Invoke RPC finished: tok=docker:index/getNetwork:getNetwork; err: Error: 2 UNKNOWN: invocation of docker:index/getNetwork:getNetwork returned an error: 1 error occurred:
	* Error pinging Docker server: error during connect: Get "http://docker.example.com/_ping": command [ssh -vvv -i C:/Users/Gili/.ssh/licensed-pulumi.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=\\.\NUL -o ConnectTimeout=30 -l root -p 22 -- 134.209.213.246 docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2
debug1: Reading configuration data C:\\Users\\Gili/.ssh/config
debug3: Failed to open file:C:/ProgramData/ssh/ssh_config error:2
debug2: resolve_canonicalize: hostname 134.209.213.246 is address
debug3: ssh_connect_direct: entering
debug1: Connecting to 134.209.213.246 [134.209.213.246] port 22.
debug2: fd 3 setting O_NONBLOCK
debug3: finish_connect - ERROR: async io completed with error: 10061, io:0000020831916B50
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: socketio_getpeername - ERROR:10057
debug3: socketio_getpeername - ERROR:10057
debug1: getpeername failed: The socket is not connected
debug3: timeout: 27817 ms remain after connect
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem.pub error:2
debug1: identity file C:/Users/Gili/.ssh/licensed-pulumi.pem type -1
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert error:2
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert.pub error:2
debug3: failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert error:2
debug1: identity file C:/Users/Gili/.ssh/licensed-pulumi.pem-cert type -1
debug3: ERROR:10061, io:0000020831916B50
debug1: kex_exchange_identification: write: Connection refused
banner exchange: Connection to UNKNOWN port -1: Connection refused


, resp: undefined

Output when running ssh -vvv -i C:/Users/Gili/.ssh/licensed-pulumi.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=\\.\NUL -o ConnectTimeout=30 -l root -p 22 -- 134.209.213.246 docker system dial-stdio from cmd.exe:

ssh -vvv -i C:/Users/Gili/.ssh/licensed-pulumi.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=\\.\NUL -o ConnectTimeout=30 -l root -p 22 -- 134.209.213.246 docker system dial-stdio
OpenSSH_for_Windows_9.5p1, LibreSSL 3.8.2
debug1: Reading configuration data C:\\Users\\Gili/.ssh/config
debug3: Failed to open file:C:/ProgramData/ssh/ssh_config error:2
debug2: resolve_canonicalize: hostname 134.209.213.246 is address
debug3: ssh_connect_direct: entering
debug1: Connecting to 134.209.213.246 [134.209.213.246] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 29972 ms remain after connect
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem.pub error:2
debug1: identity file C:/Users/Gili/.ssh/licensed-pulumi.pem type -1
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert error:2
debug3: Failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert.pub error:2
debug3: failed to open file:C:/Users/Gili/.ssh/licensed-pulumi.pem-cert error:2
debug1: identity file C:/Users/Gili/.ssh/licensed-pulumi.pem-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_for_Windows_9.5
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u3
debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u3 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to 134.209.213.246:22 as 'root'
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts error:2
debug1: load_hostkeys: fopen __PROGRAMDATA__\\ssh/ssh_known_hosts: No such file or directory
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts2 error:2
debug1: load_hostkeys: fopen __PROGRAMDATA__\\ssh/ssh_known_hosts2: No such file or directory
debug3: order_hostkeyalgs: no algorithms matched; accept original
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c,[email protected]
debug2: host key algorithms: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],ssh-ed25519,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected],[email protected],rsa-sha2-512,rsa-sha2-256
debug2: ciphers ctos: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: ciphers stoc: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: MACs ctos: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512
debug2: MACs stoc: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512
debug2: compression ctos: none,[email protected],zlib
debug2: compression stoc: none,[email protected],zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: [email protected],curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,[email protected]
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: ciphers stoc: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: MACs ctos: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,[email protected]
debug2: compression stoc: none,[email protected]
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug3: kex_choose_conf: will use strict KEX ordering
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:BGp9Du3N2m5623qbLCl5xS04kE/SiC2TTlX4J/qLAa8
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts error:2
debug1: load_hostkeys: fopen __PROGRAMDATA__\\ssh/ssh_known_hosts: No such file or directory
debug3: Failed to open file:C:/ProgramData/ssh/ssh_known_hosts2 error:2
debug1: load_hostkeys: fopen __PROGRAMDATA__\\ssh/ssh_known_hosts2: No such file or directory
Warning: Permanently added '134.209.213.246' (ED25519) to the list of known hosts.
debug3: send packet: type 21
debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
debug2: ssh_set_newkeys: mode 1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug1: ssh_packet_read_poll2: resetting read seqnr 3
debug1: SSH2_MSG_NEWKEYS received
debug2: ssh_set_newkeys: mode 0
debug1: rekey in after 134217728 blocks
debug3: ssh_get_authentication_socket_path: path '\\\\.\\pipe\\openssh-ssh-agent'
debug3: unable to connect to pipe \\\\.\\pipe\\openssh-ssh-agent, error: 2
debug1: get_agent_identities: ssh_get_authentication_socket: No such file or directory
debug1: Will attempt key: C:/Users/Gili/.ssh/licensed-pulumi.pem  explicit
debug2: pubkey_prepare: done
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,[email protected],ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected],[email protected],ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_ext_info_check_ver: [email protected]=<0>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
debug3: remaining preferred: keyboard-interactive,password
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Trying private key: C:/Users/Gili/.ssh/licensed-pulumi.pem
debug3: sign_and_send_pubkey: using [email protected] with ECDSA SHA256:mxuzdvXphlsH4hwV3n9OcFT8Emb2yUZJDZdiysERZZU
debug3: sign_and_send_pubkey: signing using ecdsa-sha2-nistp256 SHA256:mxuzdvXphlsH4hwV3n9OcFT8Emb2yUZJDZdiysERZZU
debug3: send packet: type 50
debug2: we sent a publickey packet, wait for reply
debug3: receive packet: type 52
Authenticated to 134.209.213.246 ([134.209.213.246]:22) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug3: send packet: type 90
debug1: Requesting [email protected]
debug3: send packet: type 80
debug1: Entering interactive session.
debug1: pledge: network
debug3: client_repledge: enter
debug3: receive packet: type 80
debug1: client_input_global_request: rtype [email protected] want_reply 0
debug3: receive packet: type 4
debug1: Remote: /root/.ssh/authorized_keys:1: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug3: receive packet: type 91
debug2: channel_input_open_confirmation: channel 0: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug1: Sending command: docker system dial-stdio
debug2: channel 0: request exec confirm 1
debug3: send packet: type 98
debug3: client_repledge: enter
debug1: pledge: fork
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 2097152
debug3: receive packet: type 99
debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
debug3: send packet: type 1
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:

I diffed the two outputs and as far as I can tell, they start out the same except that Pulumi gets a "connection refused" error while cmd.exe is able to connect without any problems.

Affected Resource(s)

No response

Output of pulumi about

pulumi about
CLI
Version      3.142.0
Go Version   go1.23.3
Go Compiler  gc

Plugins
KIND      NAME          VERSION
resource  command       1.0.1
resource  digitalocean  4.35.0
resource  docker        4.5.7
language  nodejs        unknown
resource  tls           5.0.9

Host
OS       Microsoft Windows 10 Pro
Version  10.0.19045 Build 19045
Arch     x86_64

This project is written in nodejs: executable='C:\Users\Gili\scoop\apps\nodejs\current\node.exe' version='v23.3.0'

Backend
Name           pulumi.com
URL            https://app.pulumi.com/cowwoc
User           cowwoc
Organizations  cowwoc, licensed
Token type     personal

Pulumi locates its logs in C:\Users\Gili\AppData\Local\Temp by default
warning: Failed to get information about the Pulumi program's dependencies: no package-lock.json or yarn.lock file found (searching upwards from C:\Users\Gili\Documents\licensed-app\deployment)
warning: Failed to get information about the current stack: No current stack

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@cowwoc cowwoc added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Nov 29, 2024
@cowwoc
Copy link
Author

cowwoc commented Nov 29, 2024

From the looks of things, Pulumi is lying to me. The first time that getNetwork() is invoked, it blocks waiting for the docker node to spin up and the provider to be marked as ready. When this happens, it attempts to connect to the node using ssh.exe. However, shortly thereafter it keeps on returning an immediate failure from getNetwork without actually trying to invoke ssh.exe. I can tell this is the case because I am watching pulumi's processes using Process Explorer.

I am invoking this function, waiting for the node to install Docker, and create the network I am looking up:

/**
 * Looks up a network.
 *
 * @param provider the docker provider
 * @param networkName the name of the network to look up
 * @param timeoutInSeconds the maximum number of seconds to wait for the network
 */
async function getNetworkWithRetry(provider: Provider, networkName: string, timeoutInSeconds: number)
{
	const startTime = performance.now();
	while (true)
	{
		try
		{
			await log.info("Invoking getNetworking...");
			return await getNetwork({name: networkName}, {provider});
		}
		catch (error)
		{
			await log.info("Failed to invoke getNetworking...");
			if (error instanceof Error)
				await log.error(error.message);
			const endTime = performance.now();
			const millisecondsElapsed = endTime - startTime;
			const secondsElapsed = Math.floor(millisecondsElapsed / 1000);
			if (secondsElapsed > timeoutInSeconds)
				throw new Error(`Failed to get network after ${secondsElapsed} seconds`);
			// https://github.com/docker/cli/issues/4221#issuecomment-1519168789
			await log.info(`Waiting for network ${networkName}: ${secondsElapsed} / ${timeoutInSeconds} \
seconds elapsed`);
			await new Promise(resolve => setTimeout(resolve, 5000));
		}
	}
}

As far as I can tell, there is no other way to do this without a retry loop. But if Pulumi isn't going to bother invoking ssh then I'm not sure how I am supposed to retry... Any ideas why it no longer attempts outgoing ssh connections?

I am attempting to connect once every 5 seconds for 3 minutes. The remote host begins responding ~1 minute after launch, but by that time Pulumi stops trying to invoke ssh...

I can see that pulumi-resource-docker.exe launches ssh.exe a couple of times within the first minute, then never launches it again. But I can't figure out where in the source-code this is happening...

@blampe
Copy link
Contributor

blampe commented Dec 4, 2024

@cowwoc your provider's SSH setup looks correct, but I suspect only the build path is currently handling SSH correctly. The getNetwork call is delegated to the upstream Terraform provider and might be hitting #1039.

If you're trying to use the remote to build images, it might be simpler to configure a remote builder and use that with the docker-build provider.

Edit: I wasn't able to reproduce this in #1297. That modifies our remote SSH test to also query the network and it seems to work...

@blampe blampe added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-feedback Blocked on input from the author kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

2 participants