Bug 2420

Summary: Race condition regarding ControlPersist and ControlMaster=auto
Product: Portable OpenSSH Reporter: Dominik Klein <dominik.klein>
Component: sshAssignee: Assigned to nobody <unassigned-bugs>
Status: CLOSED WORKSFORME    
Severity: normal CC: djm, dtucker
Priority: P5    
Version: 6.6p1   
Hardware: amd64   
OS: Linux   

Description Dominik Klein 2015-07-03 16:48:17 AEST
There seems to be a race condition in ssh between looking for an available control socket and actually using it. If in between those 2 steps the socket is removed by the original ssh client because ControlPersist timed out, the socket will be gone by the time ssh tries to use it and so ssh fails to connect.

How to reproduce:

* ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server
* exit
* sleep $timeout
* ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server

The second ssh client will occasionally (when the race condition hit) say sth like

debug1: auto-mux: Trying existing master
debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-alice-22-klein" does not exist

I am not a programmer, I did not read any code and I do not know the ssh client code. But my assumptions seems proven to me by those logs. 

My idea for a fix would be to look for a socket, if found, find the original ssh process, talk to it, tell it to keep the socket open because I want to use it and then use the socket. Not sure whether that's doable and that's just to be taken as an administrator's, not a programmer's idea.
Comment 1 Damien Miller 2015-07-17 12:38:53 AEST
If you are seeing that error then the socket has already been removed and the client should fall back to creating a new one. What behaviour are you seeing instead?
Comment 2 Dominik Klein 2015-07-21 15:37:35 AEST
This was found in an ansible environment. This is the debug log I got:

<snippedhostname> ConnectTimeout=10 PasswordAuthentication=no KbdInteractiveAuthentication=no ControlPath=/home/klein/.ansible/cp/ansible-ssh-%h-%p-%r PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey ControlMaster=auto ControlPersist=5s
fatal: [snippedhostname] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /home/klein/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to snippedhostname [217.116.120.20] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 9997 ms remain after connect
debug1: identity file /home/klein/.ssh/id_rsa type -1
debug1: identity file /home/klein/.ssh/id_rsa-cert type -1
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/klein/.ssh/id_dsa" as a RSA1 public key
debug1: identity file /home/klein/.ssh/id_dsa type 2
debug1: identity file /home/klein/.ssh/id_dsa-cert type -1
debug1: identity file /home/klein/.ssh/id_ecdsa type -1
debug1: identity file /home/klein/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/klein/.ssh/id_ed25519 type -1
debug1: identity file /home/klein/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2
ssh_exchange_identification: Connection closed by remote host
Comment 3 Damien Miller 2015-07-23 10:33:46 AEST
(In reply to Dominik Klein from comment #2)
> This was found in an ansible environment. This is the debug log I
> got:

This failure wasn't caused by the multiplexing code.


> debug1: auto-mux: Trying existing master
> debug1: Control socket
> "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does
> not exist

ssh tried to find a socket, it wasn't there.

> debug1: Connecting to snippedhostname [217.116.120.20] port 22.

ssh falls back to connecting normally

> ssh_exchange_identification: Connection closed by remote host

The server hung up on it unexpectedly
Comment 4 Darren Tucker 2016-02-05 13:43:57 AEDT
(In reply to Damien Miller from comment #3)
[...]
> > ssh_exchange_identification: Connection closed by remote host
> 
> The server hung up on it unexpectedly

Maybe the server hit its MaxStartups limit?  Maybe try bumping it in the server's sshd_config.
Comment 5 Damien Miller 2016-08-02 10:42:57 AEST
Close all resolved bugs after 7.3p1 release