Bug 2420 - Race condition regarding ControlPersist and ControlMaster=auto
Summary: Race condition regarding ControlPersist and ControlMaster=auto
Status: CLOSED WORKSFORME
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 6.6p1
Hardware: amd64 Linux
: P5 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-03 16:48 AEST by Dominik Klein
Modified: 2016-08-02 10:42 AEST (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dominik Klein 2015-07-03 16:48:17 AEST
There seems to be a race condition in ssh between looking for an available control socket and actually using it. If in between those 2 steps the socket is removed by the original ssh client because ControlPersist timed out, the socket will be gone by the time ssh tries to use it and so ssh fails to connect.

How to reproduce:

* ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server
* exit
* sleep $timeout
* ssh -o ControlMaster=auto -o ControlPersist=$timeout -o ControlPath=$controlpath $server

The second ssh client will occasionally (when the race condition hit) say sth like

debug1: auto-mux: Trying existing master
debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-alice-22-klein" does not exist

I am not a programmer, I did not read any code and I do not know the ssh client code. But my assumptions seems proven to me by those logs. 

My idea for a fix would be to look for a socket, if found, find the original ssh process, talk to it, tell it to keep the socket open because I want to use it and then use the socket. Not sure whether that's doable and that's just to be taken as an administrator's, not a programmer's idea.
Comment 1 Damien Miller 2015-07-17 12:38:53 AEST
If you are seeing that error then the socket has already been removed and the client should fall back to creating a new one. What behaviour are you seeing instead?
Comment 2 Dominik Klein 2015-07-21 15:37:35 AEST
This was found in an ansible environment. This is the debug log I got:

<snippedhostname> ConnectTimeout=10 PasswordAuthentication=no KbdInteractiveAuthentication=no ControlPath=/home/klein/.ansible/cp/ansible-ssh-%h-%p-%r PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey ControlMaster=auto ControlPersist=5s
fatal: [snippedhostname] => SSH encountered an unknown error. The output was:
OpenSSH_6.6.1, OpenSSL 1.0.1f 6 Jan 2014
debug1: Reading configuration data /home/klein/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: auto-mux: Trying existing master
debug1: Control socket "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does not exist
debug2: ssh_connect: needpriv 0
debug1: Connecting to snippedhostname [217.116.120.20] port 22.
debug2: fd 3 setting O_NONBLOCK
debug1: fd 3 clearing O_NONBLOCK
debug1: Connection established.
debug3: timeout: 9997 ms remain after connect
debug1: identity file /home/klein/.ssh/id_rsa type -1
debug1: identity file /home/klein/.ssh/id_rsa-cert type -1
debug3: Incorrect RSA1 identifier
debug3: Could not load "/home/klein/.ssh/id_dsa" as a RSA1 public key
debug1: identity file /home/klein/.ssh/id_dsa type 2
debug1: identity file /home/klein/.ssh/id_dsa-cert type -1
debug1: identity file /home/klein/.ssh/id_ecdsa type -1
debug1: identity file /home/klein/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/klein/.ssh/id_ed25519 type -1
debug1: identity file /home/klein/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2
ssh_exchange_identification: Connection closed by remote host
Comment 3 Damien Miller 2015-07-23 10:33:46 AEST
(In reply to Dominik Klein from comment #2)
> This was found in an ansible environment. This is the debug log I
> got:

This failure wasn't caused by the multiplexing code.


> debug1: auto-mux: Trying existing master
> debug1: Control socket
> "/home/klein/.ansible/cp/ansible-ssh-snippedhostname-22-klein" does
> not exist

ssh tried to find a socket, it wasn't there.

> debug1: Connecting to snippedhostname [217.116.120.20] port 22.

ssh falls back to connecting normally

> ssh_exchange_identification: Connection closed by remote host

The server hung up on it unexpectedly
Comment 4 Darren Tucker 2016-02-05 13:43:57 AEDT
(In reply to Damien Miller from comment #3)
[...]
> > ssh_exchange_identification: Connection closed by remote host
> 
> The server hung up on it unexpectedly

Maybe the server hit its MaxStartups limit?  Maybe try bumping it in the server's sshd_config.
Comment 5 Damien Miller 2016-08-02 10:42:57 AEST
Close all resolved bugs after 7.3p1 release