There seems to be a race condition in ssh between looking for an available control socket and actually using it. If in between those 2 steps the socket is removed by the original ssh client because ControlPersist timed out, the socket will be gone by the time ssh tries to use it and so ssh fails to connect. How to reproduce: * ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server * exit * sleep $timeout * ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server The second ssh client will occasionally (when the race condition hit) say sth like debug1: auto-mux: Trying existing master debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-alice-22-klein" does not exist I am not a programmer, I did not read any code and I do not know the ssh client code. But my assumptions seems proven to me by those logs. My idea for a fix would be to look for a socket, if found, find the original ssh process, talk to it, tell it to keep the socket open because I want to use it and then use the socket. Not sure whether that's doable and that's just to be taken as an administrator's, not a programmer's idea.
If you are seeing that error then the socket has already been removed and the client should fall back to creating a new one. What behaviour are you seeing instead?
This was found in an ansible environment. This is the debug log I got: <snippedhostname> ConnectTimeout=10 PasswordAuthentication=no KbdInteractiveAuthentication=no ControlPath=/home/klein/.ansible/cp/ansible-ssh-%h-%p-%r PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey ControlMaster=auto ControlPersist=5s fatal: [snippedhostname] => SSH encountered an unknown error. The output was: OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014 debug1: Reading configuration data /home/klein/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug1: /etc/ssh/ssh_config line 19: Applying options for * debug1: auto-mux: Trying existing master debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does not exist debug2: ssh_connect: needpriv 0 debug1: Connecting to snippedhostname [217.116.120.20] port 22. debug2: fd 3 setting O_NONBLOCK debug1: fd 3 clearing O_NONBLOCK debug1: Connection established. debug3: timeout: 9997 ms remain after connect debug1: identity file /home/klein/.ssh/id_rsa type -1 debug1: identity file /home/klein/.ssh/id_rsa-cert type -1 debug3: Incorrect RSA1 identifier debug3: Could not load "/home/klein/.ssh/id_dsa" as a RSA1 public key debug1: identity file /home/klein/.ssh/id_dsa type 2 debug1: identity file /home/klein/.ssh/id_dsa-cert type -1 debug1: identity file /home/klein/.ssh/id_ecdsa type -1 debug1: identity file /home/klein/.ssh/id_ecdsa-cert type -1 debug1: identity file /home/klein/.ssh/id_ed25519 type -1 debug1: identity file /home/klein/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 ssh_exchange_identification: Connection closed by remote host
(In reply to Dominik Klein from comment #2) > This was found in an ansible environment. This is the debug log I > got: This failure wasn't caused by the multiplexing code. > debug1: auto-mux: Trying existing master > debug1: Control socket > "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does > not exist ssh tried to find a socket, it wasn't there. > debug1: Connecting to snippedhostname [217.116.120.20] port 22. ssh falls back to connecting normally > ssh_exchange_identification: Connection closed by remote host The server hung up on it unexpectedly
(In reply to Damien Miller from comment #3) [...] > > ssh_exchange_identification: Connection closed by remote host > > The server hung up on it unexpectedly Maybe the server hit its MaxStartups limit? Maybe try bumping it in the server's sshd_config.
Close all resolved bugs after 7.3p1 release