Bug 1329

Summary: stale control sockets prevent connection multiplexing.
Product: Portable OpenSSH Reporter: David Woodhouse <dwmw2>
Component: sshAssignee: Assigned to nobody <unassigned-bugs>
Status: CLOSED FIXED    
Severity: normal CC: djm, jfch
Priority: P2    
Version: 5.0p1   
Hardware: Other   
OS: All   
Bug Depends on:    
Bug Blocks: 1208, 1349    
Attachments:
Description Flags
clean up stale control sockets
none
fall back from mux client to TCP connection on error
none
fall back from mux client to TCP connection on error
none
Patch to unlink stale socket, against 5.6p1
none
Remove stale socked only if ControlMaster=auto none

Description David Woodhouse 2007-07-06 01:19:52 AEST
Created attachment 1318 [details]
clean up stale control sockets

If there's a stale socket lying around, we should remove it rather than just failing to connect to it and then aborting.
Comment 1 Damien Miller 2008-06-12 16:51:31 AEST
what about cases where the mux server has exceeded its connection backlog? wouldn't you end up zapping a live mux socket there?
Comment 2 Damien Miller 2008-06-12 16:55:07 AEST
Created attachment 1513 [details]
fall back from mux client to TCP connection on error

I think this approach is safer: fall back to creating a new TCP connection after errors in the mux client path.
Comment 3 Damien Miller 2008-06-12 16:57:28 AEST
Created attachment 1514 [details]
fall back from mux client to TCP connection on error

I think this approach is safer: fall back to creating a new TCP connection after errors in the mux client path.
Comment 4 Damien Miller 2008-06-13 10:17:49 AEST
patch applied; will be in openssh-5.1
Comment 5 Damien Miller 2008-07-22 12:19:01 AEST
Mass update RESOLVED->CLOSED after release of openssh-5.1
Comment 6 David Woodhouse 2010-06-04 20:04:36 AEST
(In reply to comment #1)
> what about cases where the mux server has exceeded its connection
> backlog? wouldn't you end up zapping a live mux socket there?

I don't think so. In that case, you'll get a fairly long (perhaps infinite?) timeout on connect() followed by -EAGAIN.

You could _only_ get -ECONNREFUSED if there really isn't anything listening, I believe.

(In reply to comment #3)
> I think this approach is safer: fall back to creating a new TCP
> connection after errors in the mux client path.

The problem with this approach is that when there are stale sockets lying around, it looks nothing will ever clean them up. So with 'ControlMaster Auto' you will keep falling back to TCP connections and not using the mux socket (and not creating a new mux socket) for ever.

I much prefer the option of deleting the offending socket.

On the other hand, perhaps we just don't want sockets to appear in the file system at all -- perhaps we should allow the user to use 'abstract' socket addresses.... see bug #1775.
Comment 7 David Woodhouse 2011-03-10 02:33:59 AEDT
This *used* to work with my old patches; the stale control socket would be removed.

 $ ssh macbook
Control socket connect(/home/dwmw2/.ssh/sockets/macbook-22-dwmw2): Connection refused
ControlSocket /home/dwmw2/.ssh/sockets/macbook-22-dwmw2 already exists, disabling multiplexing
[dwmw2@macbook ~]$ logout
Connection to macbook closed.
 $ ssh macbook
Control socket connect(/home/dwmw2/.ssh/sockets/macbook-22-dwmw2): Connection refused
ControlSocket /home/dwmw2/.ssh/sockets/macbook-22-dwmw2 already exists, disabling multiplexing
[dwmw2@macbook ~]$
Comment 8 David Woodhouse 2011-05-20 22:22:58 AEST
Created attachment 2050 [details]
Patch to unlink stale socket, against 5.6p1

This patch fixes the problem.
Comment 9 David Woodhouse 2011-05-23 19:46:01 AEST
Created attachment 2051 [details]
Remove stale socked only if ControlMaster=auto

This version is modified to further address the fear in comment #1 — even though I don't think it's valid, as I explained in comment #6.

It will now *only* remove the non-responsive socket if a replacement socket is going to be automatically recreated (i.e. ControlMaster set to auto or autoask).
Comment 10 Damien Miller 2011-06-03 12:20:33 AEST
IIRC this was fixed in 5.8. We have this code now:

> if (errno == ECONNREFUSED &&
>     options.control_master != SSHCTL_MASTER_NO) {
>         debug("Stale control socket %.100s, unlinking", path);
>         unlink(path);
> }
Comment 11 Damien Miller 2011-09-06 15:33:00 AEST
close resolved bugs now that openssh-5.9 has been released