Using svn over ssh creates a mastersocket as expected however it doesn't remove it when completed. The next svn command finds the socket however no process is attached to it.
Created attachment 1159 [details] debug logs + version info
You seem to be using a modified "hpn" SSH client. Can you reproduce this with an unpatched OpenSSH?
Created attachment 1160 [details] debug logs + version info (without hpn patch) yes it is reproduceable without the hpn patch.
it looks like svn might be ungracefully killing its ssh process. Could you try running it under strace "strace -ffo svn.strace svn up"? This should create a couple of svn.strace.$PID files, one of which should contain ssh information.
Created attachment 1161 [details] svn.strace I think your right - subversion process - line 594
Created attachment 1162 [details] svn.strace.12827 (ssh process 1)
Created attachment 1163 [details] svn.strace.12837 (ssh process 2) I read somewhere on the subversion bug records that two transports is normal. cvs-ssh works ok for me and so does normal ssh shell with ControlMaster so I'll put a subversion bug in soon. Thanks for your time.
Cross references http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313371 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335528
Yes, it looks like svn is using SIGKILL to "tidy up" its ssh transport. That is definitely a bug in svn: shutting a transport down using SIGKILL gives the transport no opportunity to clean up after itself. There is nothing that OpenSSH can do to remove the socket in this case as SIGKILL cannot be caught. BTW Debian bug 313371 is unrelated.
(In reply to comment #8) > Cross references > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=313371 That's an excellent example of throwing the baby out with the bathwater: it would appear that svn is using SIGKILL simply because they want the "terminated by signal" messages, but if that's the only reason then all they had to do was use "ssh -q"...
(In reply to comment #10) > they want the "terminated by signal" messages, Doh. Make that "they *don't* want" ... Still, as Damien says there's nothing ssh can do to clean up if svn is SIGKILL'ing it. Maybe ssh could recover more gracefully from a stale control socket?
It seems like the Subversion developers have known about the problem for some time, but obviously haven't fixed it: http://svn.haxx.se/dev/archive-2006-03/1020.shtml http://svn.haxx.se/users/archive-2005-11/0194.shtml
Created attachment 1164 [details] subversion patch Try this patch to subversion. It changes its broken behaviour of SIGKILL'ing ssh while avoiding printing the termination message by setting ssh's "-q" flag. (completely untested) If it works, please feed it back to the subversion developers.
(In reply to comment #10) Thanks Damien. Patch works beautifully. Tested well. As per URL I've issued the subversion folks with a bug. I've attached comparative straces on the subversion bug report. (In reply to comment #10) > Maybe ssh could recover more gracefully from a stale control socket? yes probably. I'm sure subversion isn't the only thing that could kill a ssh client. (In reply to comment #9) > BTW Debian bug 313371 is unrelated. Here is Peter basically saying that ssh is overly verbose on a couple of types of signals and provide a patch to make the obvious ones silent. Thanks Damian and Darren, much appreciate your help.
better summary as per comment #11
another place where a stale control socket is left behind is when a "LocalCommand" hangs (like in bug #1232) and the user has to kill the ssh client during connection setup by ctrl-c here.
Created attachment 1309 [details] Delay start of listen on multiplex control socket In response to comment #16: This patch should fix your problem - can you verify?
Not enough sorry. The below example used OpenSSH_4.6p1 with the attached patch A ssh connection was made (and a control socket was created). The ssh connection was killed with a kill -9 from another terminal. An attempt to reconnect was made and this failed as per below: localhost $ ssh devgentoo Enter passphrase for key '/home/dan/.ssh/id_dsa': Last login: Fri Jun 22 08:05:10 2007 from 59.167.43.249 HP ProLiant DL380 G5 Server dragonheart@woodpecker ~ $ Killed localhost ~ $ ssh devgentoo Control socket connect(/home/dan/.ssh/master-dragonheart@dev.gentoo.org:22): Connection refused Enter passphrase for key '/home/dan/.ssh/id_dsa': ControlSocket /home/dan/.ssh/master-dragonheart@dev.gentoo.org:22 already exists localhost ~ $ ssh -v OpenSSH_4.6p1, OpenSSL 0.9.8d 28 Sep 2006 Recommendation: a) delete the control socket when a connection refused occurs (the connection obviously isn't much use anyway) I'm not sure of all the things that could cause a connection refused but given its local I don't think there would be too many options. b) create a different socket name if you get a "already exists" error like maketemp though a) would be easier.
(In reply to comment #18) > Recommendation: > a) delete the control socket when a connection refused occurs (the > connection obviously isn't much use anyway) I don't think that's a the right thing to do. You'll get a connection refused if the agent has reached its listen backlog limit (eg if you have a bunch of clients attempting to access it all in quick succession) but in that case it's just a temporary condition.
ah thought there must of been a reason you didn't delete it. How about making the 'already exists' on an attempted reconnection a warning and fall back to a new connection?
something similar to attachment #1309 [details] has been committed, and we are looking at implementing fallback to TCP connection on mux client errors.
the patch in attachment #1309 [details] has been applied applied; will be in openssh-5.1
Mass update RESOLVED->CLOSED after release of openssh-5.1